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FITTING THE LOGISTIC BY MAXIMUM LIKELIHOOD* 


J. L. Hopces, Jr. 
University of California 
Berkeley, California, U.S.A. 


Summary. We present a technique, called the transfer method, 
which simplifies the work of fitting a logistic response curve to quantal 
data by maximum likelihood. The method, which does not require the 
use of a computing machine or of tables or charts, nor that the number 
of trials at each level be the same, quickly will yield the maximum 
likelihood estimates to the accuracy with which a graph may be read. 
When these values are used to initiate the customary iterative solution, 
one cycle of iteration will produce accuracy otherwise usually attained 
only after two, three, or four cycles. The transfer method also gives 


good starting values for fitting the integrated normal by maximum 
likelihood. 


1. Introduction 


In many experiments involving quantal response to a graded 
stimulus, the results agree reasonably well with the binomial logistic 
model. Specifically, if there are R,; (positive) responses when n, trials 
are made at (stimulus or dose) level x; ,7 = 1, --- , k, the model assumes 
that the R,; are independent binomial random variables with probability 
of response 


P,(a, B) = 1/[1 (1) 
at level xz; . For a discussion of the model and further references, see 
Berkson [1955]. 

A variety of methods for estimating the parameters a and 8 have 
been considered; we are here concerned only with the maximum likeli- 
hood estimates, which are the solutions, if any, of 


(2) 
k 
ar = (&, B) 


*This investigation was supported in part by Research Grant G-30066 from the National Institutes 
of Health, U.S. Public Health Service. 
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where r,; is the observed value of R; . These cannot be solved ex- 
plicitly, but there are methods [Berkson 1957] for obtaining iteratively 
a series of approximations which converge to the solution, if it exists. 

The computational labor of the iterative solution is considerable, 
and interest has centered on devices to obviate it. If the number of 
trials at each dose level is the same, say m, = --- = ™ = n, We May 
divide equations (2) by n, and write the first of them as }>*_, p; = 

*_, P,(@, 8) where n,p; = r;. By combining this equation with the 
second, we can obtain >>*_, (cx; + d)p; = (ex; + d)P;(@, 8). 
We can choose c and d to convert the system of levels to a standard 
form, for example, cr; + d = i. It now appears that, with k equally 
spaced levels and equal numbers of trials at each level, & and 8 depend 
only on k, >>*_, p; , and >-*_, ip, . Thus for each k it is possible to 
provide tables or nomograms for @ and $. Worcester and Wilson [1943] 
have published tables for k = 3, and Berkson [1953] has published an 
illustrative nomogram for 4 = —@/f, fork = 4. Ina private communi- 
cation, Dr. Berkson has informed me that he has prepared and expects 
to publish nomograms for 7, 8, ¢;, for k = 2(1)6. 

Pending the availability of Berkson’s charts, or even with those 
charts when the numbers are unequal or the levels not evenly spaced, 
there is a tendency to resort to visual fitting on a logistic scale, or more 
conveniently on logistic graph paper [Berkson 1951]. Let the logit 


’ 1, be defined by 1; = Im [r;/(n; — 1;)].. We may plot the points (z; , /,) 


and fit to them a straight line by eye. Estimates can then be read from 
this line if its equation is taken to be 1 = a* + f*x. The estimates 
a* and 6* thus obtained are of course not the maximum likelihood 
estimates, or even well-defined estimates at all, because of the sub- 
jective element in visual fitting [Berkson 1957]. 

To throw some light on the magnitude of this subjective element 
in our problem, a small experiment was conducted. The data of 
Example 3, which has been used as an illustrative example by Fisher 
and Yates [1943] and by Berkson [1957], was plotted on logit paper, 
and shown separately to ten statisticians, who were asked to make a 
careful visual fit of a straight line. From these lines, the following 
values for 8 and y = —a/ were read:* 

B: 0.87, 1.03, 1.04°, 1.11, 1.20, 1.29, 1.30, 1.41, 1.47, 1.70, 1.74, 2.63 

7: 6.10, 6.43, 6.46, 6.46, 6.47, 6.52, 6.54, 6.62, 6.64, 6.64, 6.64, 6.67°, 

6.82. 


*In assessing accuracy it is better to use y, which is the “L.D.50” or level at which P = 3}, rather 
than a. For given precision of fit, the error of a can be made arbitrarily large by choice of origin on 
the z-axis, while the error of + is invariant under change of origin, as is the relative error of 6 under 


change of scale. 


3 
af 
¥ 
4 


FITTING THE LOGISTIC 455 


For comparison, the maximum likelihood estimates are shown, marked 
with *, and also the initial values used by Berkson, marked with °. 
It is clear that the variation from person to person is substantial, 
especially for 8; one of the lines is more than three times as steep as 
another. It also appears that the initial values used by Berkson (which 
were adapted from those of Fisher and Yates, and for which three 
cycles of iteration were required to produce three-figure accuracy) are 
considerably closer to the final estimates than some that may arise in 
practice. Thus, we must expect sometimes to need more than three 
cycles to attain a three-figure result. 


2. The transfer method 


The method here proposed to ameliorate these difficulties is based 
on three observations, the first two of which have frequently been 
made. 

(i) There is a simple set of minimal sufficient statisties for a and 8, 
namely 


k 
T= (3) 
t=1 i=l 
(ii) The maximum likelihood estimates are functions of S and T, 
and therefore have the same values for the observed r; and for any 
other set r/ satisfying 


k k k k 
s= Dr, t= Yar, = (4) 
t=1 t=) 
We shall say that the r; and the r’ are equivalent. 

(iii) If we choose an equivalent set r{ such that, for the correspond- 
ing li = Inrt/(n; — r/), the points (2; , 1!) are collinear, the values of 
the maximum likelihood estimates may be read from the line l’ = & + 
6x; drawn through these points. 

The problem then becomes: How to produce an equivalent set with 
the collinear property? The value of s is preserved if the total number 
of responses is left unchanged, which will happen whenever we transfer 
responses from one level to another. For the value of t to be preserved, 
these transfers must not alter the “center of gravity” of the responses. 
For example, if the levels are equally spaced, as is usually the case, we 
may transfer one response from a given level to each of those to either 
side: this transfer may be denoted as (+ 1, —2, + 1). In the same 
notation, other simple transfers preserving the value of t are (+ 1, 0, 
~ 8, + (+ +3.- (+ 1,0, — 2,0, + 1), (+ | 
— 1, + 1), or any of these with the signs changed, or any multiple of 
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the preceding. In general, whether or not the levels are equally spaced, 
the transfers are of the form (a, , --- , @) where 


k k 
= > 2,0; =), 


i=1 t=1 

Suppose now that the original data are plotted on logit paper, and 
a straight line is fitted by eye. Some points will be above, others below 
the line. A pattern of transfer which would improve the collinearity 
of the points can now be found, and an appropriate multiple of it 
determined so as to bring the points near to the provisional line. The 
adjustments are made in the response numbers, producing new values 
r; . These may again be plotted, and the process repeated, always 
concentrating on the points farthest from the provisional line. As is 
shown on the examples below, after three or four transfers, the points 
will be collinear to graphical accuracy. The line may be drawn through 
the points, and the parameter values read from it. The whole process 
takes about five minutes, with the only arithmetical operations being 
simple additions and subtractions. 

For some purposes, for example in the publication of a definitive 
experiment, graphical accuracy may not suffice. In such cases, the 
values obtained from the transfer method may be used as starting 
values in the arithmetical iterative solution. If the graphing is carefully 
done, these values are about as good as those resulting from the second 
cycle of the conventional iteration, so that usually only one cycle of 
iteration will be needed when the transfer method has been used first. 


3. Examples 


Various refinements and details are best presented in the context of 
examples, which can only be followed clearly if the reader plots the 
successive equivalent sets. This is most easily done on logit paper, or 
failing this with the aid of a table of logits [Berkson 1953]. 


Example 1 


When we plot the data of Table 1, we see that /, is high while /, and 
l, are low, so that the transfer pattern (+ 1, — 2, +1) is called for. To 
determine the appropriate magnitude, we lay a transparent straight 
edge on the paper and make a visual fit; it appears that r, should be 
near 4, so we use unit magnitude. The points (z; , li) are nearly collinear, 
but a further lowering of r. by about 0.3 is indicated.* The (z; , 1’) 
are collinear to graphical accuracy, the line is drawn, and from it we 


*The reader may be disturbed by our use of fractional corrections, inasmuch as the ri and ni are 
integers. Since, however, the problem is unaltered if all ni and r; are multiplied by the same constant, 
we may imagine all values multiplied by an appropriate power of ten, and use as many decimal places 
as we like. Equivalently, we may regard the rj as continuous variables. 
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read the values 6 = 0.62, 7 = 0.51, where the final pkices are not 
certain. 

When there are only three levels, our method admits a simple 
arithmetical extension, which is in this case easier than the iterative 
method, and which gives any desired accuracy. The logits 1; are 
collinear if and only if their second difference D = 1, — 2l, + l,; vanishes. 
Vor the r’’ , we find D” = 0.0129, while for the r/’’, D’” = —0.0125. 
Interpolation gives the and hence = (Ii? — = 0.631, 
a= = —0.345,7 = = 0.547. 


Example 2 


Table 2 illustrates the fact that the transfer method may also be 
used when the n; are not equal. The /’’ are here already close enough 
to collinearity for many purposes, but the refinement to r/’’ is easy. 
The graphical estimates 8 = 1.18, 7 = 1.06 may be compared with the 
values 8 = 1.1695, 7 = 1.0617 obtained after they are subjected to one 
cycle of Berkson’s iterative method. These are in error by at most 1 in 
the fifth figure. 


Example 3 


For our final example we take one which has been used by Fisher 
and Yates [1943] and Berkson [1957]. This example illustrates how 
easily the transfer method deals with the troublesome case of zero or 
100 per cent response. In this example, 8 trials were made at each level. 
To simplify the calculation of the percentages, we multiply all values 
of n,; and r, by 125 before starting the work. 

From the fitted line, we see that desirable values for r, and r, would 
be about 10 and 40; ‘we take these responses from the high r; , and at 
the same time transfer 30 from r; to the low r; to preserve the value of 
t. Similarly we can bring r; to a desirable 950 by transferring 50 from 
there to r; , compensating by a similar shift from r; tor; . These pre- 
liminary transfers produce the r/ , which are plotted. 

It now appears that r{ is still low by about 100, which can usefully 
be taken from r, and rg. The r’’ have a plot which is nearly collinear, 
though a careful inspection of the graph suggests a few minor additional 
transfers. From the line through the (z; , l/’’”), we read 8 = 1.21, 
£50 = 6.65. These values are in error by 0.006 and 0.006, while the 
values Berkson obtains after two cycles of iteration are in error by 
—0.004 and 0.022. 


4. Conclusions 


We conclude with a number of remarks on the method presented 
above. 
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(i) We have limited the exposition to the case when the levels are 
equally spaced, because such levels are customary and because the 
method is simplest in this case. The method may still be used when the 
differences between successive levels are incommensurable, but the 
t-preserving transfer patterns are no longer as simple. Il*or example, 
when transferring from level 0 to levels —x < 0 and y > 0, we must 
use a multiple of'(—y, x + y, —2). 

(ii) We have here been concerned only with the technique of maxi- 
mum likelihood estimation, and not with its desirability for fitting the 
logistic. It should be mentioned that there is substantial evidence that 
the minimum logit chi-square estimates [Berkson 1955] have for this 
problem smaller expected squared errors than do the maximum likeli- 
hood estimates. 

(iii) The maximum likelihood estimates fail to exist for certain 
values of the 7, . Appropriately enough, it turns out in these cases 
that there do not exist t-preserving transfers which make all values of 
l; finite. 

(iv) When Berkson’s charts, described in Section 1, become avail- 
able, the estimates may be more easily obtained from the charts than 
by the transfer method, provided there are not more than six equally 
spaced levels, and provided the same number of trials is made at each 
level. 

(v) Our method requires that there exist simple sufficient statistics, 
and thus is not directly applicable to the problem of fitting the inte- 
grated normal. The integrated normal and logistic curves are however 
so nearly alike, that even if a normal curve analysis is intended, a 
preliminary fit of the logistic by the transfer method (or with Berkson’s 
charts) will be profitable, as it will provide good starting values for the 
iterative machinery. 

(vi) Because the probit transformation is such a close approxima- 
tion to the logit transformation, one may use probit paper with the 
method instead of logit paper, unless high accuracy is needed. — 

(vii) Instead of using a visually fitted line when making the initial 
adjustments, one may use any of the extant quick methods, such as the 
Spearman-Karber technique. 

(viii) Our method assumes that the observations follow the logistic 
binomial model, and we may wish to test this assumption. The fitted 
line may be used in carrying out a chi-square test of k — 2 degrees of 
freedom, where of course the original unadjusted data would be used. 
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MULTINOMIALLY GROUPED RESPONSE TIMES 
FOR THE QUANTAL RESPONSE BIOASSAY’ 


Rosert F. Wuire ann Josern Gi. GRaca 


Slalistical Laboratory and Department of Veterinary Physiology and Pharmacology 
Towa State College 
Ames, 


SUMMARY 


A method is presented for analyzing quantal response data in 
which the response times have heen ‘‘multinomially” grouped into. inter- 
vals by the fact that each individual has been observed only at seversl 
pre-specified times, rather than continuously. Particular attention 
is given to the model 


y=a+t Br + yit Sat 


where z and ¢ are the values of the dose and time metameters and y is 
say the logit of the response at that dose and time and where in some 
cases 6 may be assumed to be zero. Since the response numbers are 
accumulated for the successive times within a dose, the observed y 
values are not independent within deses. This introduces some com- 
plexity into the analysis which is at least partly resolved by the use 
of an outlined minimum modified chi-square method for estimating a, 
8, y, 6. Methods of choosing metameters are discussed and an example 
is given in which the dose metameter is log dose and the time metameter 
is the reciprocal of the square root of time. 


1. INTRODUCTION 
Consider a graded stimulus which, when administered to individuals 
in a given population, causes a quantal (all-or-nothing) response. A 
relation is conceived, 
G(p, &, 7) = 0, (1) 


where p is the proportion of the population which would respond to an 
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amount (or dose) — of stimulus within time 7. In some cases, as when 
insects ure exposed to various levels of a fumigant for various times, 
the time 7 is part of the treatment and may be called exposure time. 
At the other extreme, the stimulus is given as a mass dose at one point 
in time and 7 is analogous to survival time. In general, it may be referred 
to_as response time. 

The analysis of quantal response data usually amounts to the 
estimation of the relation (1). Suppose doses ¢, , --- , &,, are adminis- 
tered to n, , -:* , m individuals, respectively. At any time 7 after 
administration of dose £; the response observed is the ratio of the number 
of individuals which have thus far responded to dose £; to the total 
number n, exposed. Experimental approaches to this problem may 
be classified according to the frequency of observation of the group 
of n; individuals in a dose group: once, several times, or continuously. 
Consider the following experimental procedures. 

A. Observation at one time only. A single response time is specified; 
the experimenter counts an individual as having responded only if it 
responded at or before that time. For the 7-th dose, the “observation” 
is py = r;/n; where r; is the number which responded in the specified 
time. 

B. Observation at several times with “binomial” grouping of response 
times. The n, individuals to be exposed to dose ¢; are randomly divided 
into k; sub-groups of , Mix, individuals each (>; = n,). 
Times 7;; < -++ < ure pre-specified. At time r,; , the n,; individuals 
in the j-th sub-group are examined and the response ratio p;; = 15;/n;; 
is observed where 7,; is the number which have responded in the j-th 
sub-group. A sub-group is examined only once, at its pre-specified 
time. The actual response times of the j-th sub-group are known only 
to be in the interval (0, 7;;) and are thus necessarily grouped in a 
manner which we call binomial. 

C. Observation at several times with “multinomial” grouping of 
response times. Times 7,4, < +++ < Ti, are pre-specified for dose é; 
as in procedure B, but in this case the n, individuals are all examined 
at every one of these times. The observed response ratio at the j-th 
time is pj; = 1i,;/n, Where r;, is the cumulative response up to time 
t.4;- The actual response time of a responding individual is thus known, 
more precisely than in procedure B, to be in some interval (7; ,;-1 , Ts;)- 
The response times are thus grouped in a manner we call multinomiai. 

D. Continuous observation.. The response times are observed 
exactly by continuous observation from the point of stimulus to the 
point of response. Thus suppose the n, individuals exposed to dose é; 
are observed to respond at times < Now at time 7,; ,) 
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individuals have responded to dose Henee the n; triples (1/n, , 
sumpling variation, points on the surface (1) and may be used to 
estimate G. It is to be noted that this procedure, unlike procedures 
A, B, C, takes the response ratio as a controlled (or independent) variable 
and the response time as an uncontrolled (or dependent) variable. 

It. is with procedure C that this paper is concerned. It is obvious 
that the procedures are listed in the order of increasing information 
about G to be obtained from a given number of individuals. Procedure 
A is perhaps the most familiar in quantal response analysis. If the 
specified time is 7, , this procedure cannot give information about the 
surface (1) but only about the curve 


Gip, = 0. (2) 


In many situations, this is adequate. Thus a muscle contraction 
response under electrical stimulus does not usually require a precise 
statement of the response time. If G@ is estimated as G then that value 
for which 


G(3,E, = 0 


is the estimated HDs (effective dose fifty), the dose which, within time 
Ty , ciuses 2 response in 50 per cent of the individuals to which it is 
given. The functional dependence of this quantity on 7. may or may 
not be very important. 

As compared to procedures B and C, procedure D has certain 
disadvantages which may offset its greater informativeness. If for 
some reason not all individuals respond, truncation problems (see 
Sampford |1954]) are encountered. Even in cases such as death where 
the response can always be observed eventually, there are problems of 
recovery from stimulus. The response in such cases may be due to 
natural causes unrelated to the stimulus. Finally, there is the greater 
cost or inconvenience involved in continuous observation especially 
if the responses are spread out over a long time interval. Three papers 
by Sampford [1952, 1952a, 1954] offer what appears to be the best 
current statement of procedure D. 

Procedures B and C both determine triples (p,; , & , 71;) which are 
estimates of points on the surface (1). Elementary considerations 
suggest that a better exploration of the surface will be obtained if the 
observation times are generally less for the higher doses than for the 
iower. That is, if &; < & < < &,, then.take ry, > ta, > > 
, » for example. 

Usually, the fact that procedure C extracts more information from 


. 
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a given number ot individuals than does B makes it the better procedure. 

However, it has certain disadvantages. In the first place it may be 
impossible in some situations to examine an individual more than once, 
for the examination may permanently interrupt the treatment. This 

could be the case in fumigant studies on-insects and is certainly the 

case when the response: is the occurrence of a tumor the existence of 

which can be determined only by sacrificing the individual. Such a- 
consideration would also rule out procedure D, of course. 

A lesser disadvantage of procedure C is its greater computational 
complexity due to the fact that the observed response ratios for the 
various times within a dose are not statistically independent, as they 
are in procedure B. This increased complexity would be serious if © 
estimation by maximum likelihood were followed. One purpose of this 
paper is to resolve this complexity to some extent by outlining minimum 
modified chi-square estimation. for procedure C. 

Tor an example of maximum likelihood estimation in procedure B, 
see Finney [1952, p. 117]. Sampford [1952] has an example of maximum 
likelihood estimation in procedure C with what amounts to a single 
dose. 

We have stated above that in procedures B and C, the t time is an 
independent or controlled variable and this may appear surprising, 
particularly if the time is a survival as opposed to an exposure time. As 
matter of fact, the time is not strictly a response time as in pro- 
cedure D, but is the time at which the response is observed. Since the 
times of observation perform the function of dividing the actual response 
times into groups, and since the observed response ratio at a given 
time is an unbiased estimate of the population value for that time, no 
ambiguity is involved. The fact that the dependence or independence 
of a variable is not an intrinsic property but is determined by the 
manner in which observations are made is well illustrated by the cases 
of direct and indirect assays (Finney [1952a]). In the direct assay, the 
amount of stimulus just necessary to produce a given response is observed 
us a depe ndent variable. In the indirect assay, the responsé produced 
by a given amount of stimulus is observed, and the stimulus amount 
is an independent variable. 


2. APPLICATION OF MULTINOMIAL GROUPING TO THE QUANTAL 
RESPONSE BIOASSAY 
Suppose that equation (1) ean be written in the form 
» = ft, 2). (3) 


It, is often possible to transform p — y and — — x so that for fixed 7 (as 


| 


= 
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in procedure A), (3) becomes 


y=a, + Bx (4) 


(see, for example, Bliss [1935]).. Frequently (but see Finney [1952a], 
page 66) x = log ~ suffices for the dose transformation. Two widely 
used transforms of the responses (p — y) are the probit and the logit 
(Berkson (1951], Finney [1952]). 

The development of equation (4) is usually the first step in probit 
analysis or its analogs. [xperimentally, groups of: individuals are 
observed at some fixed time after stimulus. The observed y’s are plotted 
against various metameters xz in an attempt to find that metameter 
which gives approximate linearity to the plotting. 

Similarly, the dosage could be fixed and the responses at various 
times obtained. This relation may be linearized in much the same way 
as the response-dose relation (Bliss [1937]), although the transformation 
may not be log time (see Box and Cullumbine [1947]). Thus we have 


= + Bt (5) 


where ¢ is the time metameter, a transform of r. 
Now if y is a linear function of x for fixed ¢ and a linear function of ¢ 


- for fixed x, then (see Finney [1952], pp. 104 and 116) 


y=at yt + oat (6) 


is the most general expression for y. Hence if dose and time are simul- 
taneously varied, equation (6) should give a reasonable model, with the 
possibility that in some cases 6 may be assumed to be zero. A difficulty 
with this model arises from the fact that some values-of x could exist 
for which the response would appear to be a decreasing function of 
time, which is, at least in the usual cases of irreversible response, a 
logical impossibility. This difficulty can arise only if 6 # 0. The only 
way to resolve this difficulty is, of course, to conclude that (6) is only 
an approximation to the true relation over some restricted range of 
xandt. . 

A biological explanation of non-additivity of dose and time (6 # 0) 
might be constructed in some cases by postulating that the stimulus 
is effectively a mixture of two or more stimuli. Then, if these stimuli 
are independent in the sense ‘of Plackett and Hewlett [1952] in that 
they react on different physiological systems, we can conceive that the 
rate of change of response per unit change of dose changes in time 
beeause of changing relative importance of the two or more stimuli. 
It is not within our present purpose to consider eauses of non-additivity; 
however, its passible existence must be contemplated. 


i 
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We consider the problem of estimating the parameters a, 8, y, and 
5 by use of experimental procedure C. For the i-th dose we observe 
that: by time 7,; , 7;; individuals (of the n, exposed to dose £;) have 
responded. This gives the k; response ratios Di; = 7.;/n,. Assuming 
the logit transform is used, 


= log.[p:;/(1 — is computed 
and then, we have the non-additive dose-time model, 


where £; — x, and 7,; — t;; , under the dose and time transformations. 
We consider also the additive dose-time model, 


Yi; =< + Ba, + (8) 
After fitting both of these models, the chi-square analysis 


Source df. 

a, 3 

6 (non-additivity) 1 
Deviations 


can be computed. 

If the deviations chi-square is judged to be non-significant, outta 
(7) is considered to describe the data adequately. The choice between 
(7) and (8) as the final form of the response-dose-time surface is then 
made on the basis of whether the non-additivity chi-square is, or is not, 
judged to be significant (as a one degree of freedom chi-square). 

Frequently, the primary goal is to estimate that dose which causes 
a given proportion of response within a given time. Thus, if the given 
proportion is p and the metameter value of the given time is ¢t, then we 
find Y such that p — Y and take 


or B+ ot (9) 


depending on whieh model is chosen. Such values of @ could be trans- 
formed back to ~ (on the basis of the dose metameter). Computation 
of values of £, with their corresponding values of p and 7, appears to be 
the best method of summarizing the estimated response-dose-time 
surface G = 0. In fact, if one wished to compare two lethal compounds 
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say on the basis of their L1),.s (lethal dose 50: dose estimated to kill 
5OY of the individuals to which it is applied), it is obvious that this 
comparison should be made over a range of values of time, for it is 
conceivable that one compound could be more lethal than another 
with respect to an curlier response time and less lethal with respect to 
dater time. 

A significantly large deviations chi-square may be attributed to a 
breakdown in the model (an inappropriate choice of the dose metameters 
and/or the time metameter or to heterogeneity (see Finney [1952}). 
Heterogencity would most likely be due to having the individuals 
within a dose group more nearly alike than are individuals in different 
dose groups. Common causes of this are the practice of testing the doses 
on separate dates and (Gin small animal studies) of putting all the animals 
in one dose group in the same cage with m cages for the m dose groups. 
As much as possible, randomization of assignment of individuals to 
dose groups, of dates, and of cages should be performed to avoid such 
heterogeneity, 

If a significant deviations chi-square is attributed to the choice of 
mnelameters, the fitting would presumably be done anew, with different 
imehumeters. “Phe discussion in the following section is then pertinent, 
On the other hand, if the inflation is ascribed to heterogencity, the test 
of non-additivity seems best made as a variance ratio test, taking 

(Deviations chi-square) ee 
as Fisher's / with | and (> k, — 4) degrees of freedom. 
3. 

Jt appears that the major practical difficulty with this procedure 
lies in choosing appropriate metameters. It is always a good practice, 
at least with data of a form new to the experimenter, to plot the y 
values (say logits) against various metameters before proceeding with 
the formal process of estimation. Tlowever, if one decides, after having 
fitted the data, that the chosen metameters are inappropriate, then 
much of the computations, being independent. of the metameters, en 
be sulvaged for use with fitting with another choice of metsuneters. 

The problem is complicated by the fact that there are two meta- 
meters to choose. However, the preliminary plotting to choose a time 
at least, is rather simple. Thus, for the dose, we have, 
from equation (7), 


ke, , (10) 


a 
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so that, if the cumulative responses (transformed. to logits, say) within 
any one dose are plotted against the appropriate time metameter t, a 
straight. line should result, apart from random error. Since these 
responses are cumulative and hence non-decreasing a8 one proceeds 
along the time axis, one will never have the disconcerting experience of 
having to plot a line with non-decreasing slope through a set of points 
in which one point is higher than a subsequent point, as can happen if 
the plotting is against the doses. 

Consider the class of monotonic increasing transformations of time 
T, given by all real A: 


1 
ty = r, = 0 
= log r, A= 0. (11) 
Then, for all real yu, A, 
* 


which is an increasing or decreasing function of 7 according as » > A 
or » < \. Suppose the appropriate time metameter is t, , where u is 
unknown, and we plot the y’s within a dose against values of ¢, corre- 
sponding to the response-times, where \ is known. (The y’s are plotted 
as ordinates, the 4, as abscissae.) One of three conditions may be 
observed of a smooth curve drawn through the points: 


i. It is concave upward. 
ii. It is essentially a straight line. 
iii. It is concave downward. 


If (i), the slope is increasing so that we know that » > A. If (ii), the 
slope is constant so that we know » = 2. If (iii), the slope is decreasing 
so that » < . We could then choose a new metameter t, by changing 
d in the direction of » and plot again, repeating the process. By thus 
adjusting the value of \ according to the direction and magnitude of 
curvature, we should reach condition (ii) and thus find the appropriate 
time metameter. The metameter should be checked by a separate 
plotting for each of the m doses. It may be possible to justify some 
choice of A on theoretical grounds (see Finney [1952] pp. 172-176), 
but for our present purpose it suffices to regard this as an empirical 
device for linearizing data. It is not necessary to “estimate” \ with 
great accuracy; values \ = —1, —4, 0, 3, 1, will usually be sufficient, so 
that it dees not seem necessary to decrease by 1 the degrees of freedom of 
the residual chi-souare of the preceding section. 
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A somewhat more objective method of choosing the time metameter 
(and eventually the dose metameter) is actually to fit (10) to the data 
(by tainimum modified chi-square, say). We may call this the within- 
doses model. It gives the analysis of chi-square: 


Source df. 


fa; , Bi} 2m 
Deviations within doses p k; — 2m. 


Most of the computing for this model is needed for fitting (7) and (3), 
so that there is little extra work involved. The deviations within 
doses chi-square gives an ‘‘ultimate” test of the adequacy of the time 
metameter, and does not depend on the dose metameter. It is helpful 
in isolating causes of inflation of the deviations chi-square in the 
equations (7) and (8) analysis. 

In any case, after ascertaining the adequacy of the time metameter, 
there are m regression lines of response on time, fitted either by eye or 
“objectively.” From equation (10), we see that the intercepts of these 
lines should be the values of « linear function of the appropriate dose 
metameter. We may therefore plot the intercepts a; against various 
dose metameters in the same way as values of y were plotted against 
various time metameters, in an attempt to find the linearizing meta- 
meter. (The same thing could be done with the slopes, but if, as may 
often be the case, 6 is close to zero, these slope terms will show little or 
no consistent trend with dose.) The fitted intercepts are less subject 
to experimental error than are the individual observations and should 
give relatively clear-cut results. - 

We then have the suggested computational procedure: 

i. Plot the logits y within a dose against various time metameters 
i, and thereby ‘‘determine” the time metameter ¢. 

ii. it equation (10) to the data. 

iii. Ixxamine the deviations within doses chi-square os k; — 2m 
degrees of-freedom) as a final check on the chosen time metameter. 

iv. If the time metameter is judged adequate, plot the fitted inter- 
cepts a; against various dose metameters «, and thereby ‘determine’ 
the dose metameter wx. 

y. Fit equation (7) to the data. 

vi. Examine the deviations chi-square (>>; k; — 4 degrees of 
freedom) as a final check on homogeneity and/or the dose metameter. 

vil. If. the dose metameter is judged adequate, fit equation (8) to 
the data. 

viii. examine the non-additivity chi-square (1 degree of freedom) 


4 
| 
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| 
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4 


and thereby. choose between sili (7) and (8) as the final fitted 
form, 

The tests ef significance which have been alluded to should be con- 
sidered as approximations only, and in the interests of conservatism 
should be kept on the large side, say about 20% rather than 5%. 
Finally, we note that in step (iii) above, the chi-square could be large, 
even if the time metameter is adequate, because of a peculiar heter- 
ogeneity induced by having individuals from two or more populations of 
widely different susceptibilities. Furthermore, a chi-square or F value 
may of course occasionally be large simply by chance. ‘The experimenter 
should judge any test in the light of past experience with similar kinds 


of data. 


4. COMPUTATIONAL DETAILS FOR MINIMUM MODIFIED 
CHI-SQUARE ESTIMATION USING LOGITS 

Let (for 7 = 1, , m): 

pi, = the number of individuals at dose 7 which respond by time 
Ti 

ps; = the number of individuals at dose i which respond by time 
and which had not responded by time 7;,;-, , (2 < j < 

Piec+1 = the. number of individuals at dose i which “survive” 


(z.e., which do not respond by time 7; 


Tj = Pi + +++ + p;; , the cumulative response up to time 7,, at 
dose i, 

= log. [ri;/(ni — where n,; is the number of individuals 
initially exposed to dose 7, 


Wi; = | n, li a J = and 


| n; n; Jor. 


j= i, — 1}. 


Then for each dose, a matrix W, is formed; hove exemplified for hk; = 4. 


ty, 0 0 
Vii We V2 OF 
0 Vi2 Vis 

L 0 Vig Wi 
(The matrices have w’s on the major diagonal, v’s on the two secondary 
diagonals, and zeros elsewhere.) 
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With these matrices the quantities 7;; are computed as: 
In Yur 
Vix: Yirs 
For each dose, compute 
Si = +2 , the sum of the elements of W, , 


S, = » and (13) 
Finally, 
total chi-square = (14) 


The computations so far are basic and do not depend on the meta- 
meter choice. At this stage, follow step (i) of the preceding section 
plotting yi: , , Ys, against --- , . Having chosen the time 
metameter with values {t;;}, compute 


| tin 
Wi - |; (15) 


and then, fori = 1, --+ , m, 
Si = Si = 5 Si, = (16) 
The fitted values of {a; , 8;} of the within-doses model [equation 
(10)] are 
= (S; — 


i (17) 


The chi-square due to {a; , 8;} (2m degrees of freedom) is 


The analysis of the within-doses model is then completed by forming 


i i 
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the deviations within doses chi-square i, I; — 2m degrees of freedom) 
as (14)-(18). 

Step (iv) of the preceding section has been reached, and the &; may 
now be plotted against various x, as outlined in the preceding section. 
After the dose metameter with values x, , +--+ , 2,, is chosen, equation 
(7) is fitted by solving the equations 


i =| (19) 


| 6 > | 


for a, B, y, 6. The terms in these equations are formed primarily from 
the quantities of equations (13) and (16). 
We then form the chi-square due to a, 8, y, 6 as 
+47 t+ § (20) 
where &, 8, 7, 6 are the solution to (19). Then the deviations chi-square 
(>>; ki — 4 degrees of freedom) is computed as (14)-(20), thereby 
arriving at step (vi) of the preceding section. 

Finally, the first three of equations (19) (with the 6 column deleted) 
are solved for a, 8, y to fit equation (8) to the data. With 4, 8, 7 as the 
solution, the chi-square due to a, 8, y is 

The non-additivity chi-square is computed as (20)-(21). 
In summary, we have for the within-doses model: 


Source dfs Chi-square computation 
{a; , By} 2m (18) 
Deviations ky — 2m (14)-(18) 


and for the dose-time models: 


Source dfs Chi-square computation 
(a, B, y) 3 (21) 
6 (non-additivity) 1 (20)-(21) 
Deviations Dk: 4 (14)-(20) 


bag 
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Derivation of the results of this section, along with variance estimates, 
are given in the Appendix. The user of this procedure,-planning to 
solve equations (19) with a desk calculator, is strongly advised to 
consider the solution methods of Crout as given by Tintner [1952]. 
These are especially good in this case where the solution of a principal 
subset of equations is also desired. 


5. ZERO CLASSES 
In the definitions of the w,,; and v,; it is necessary that 


(where pi: + °°: + pi.a,41 = 7, , the number of individuals exposed 
to dose 7). That is, for each dose, there must be some responding 
individuals within each time interval and some individuals which 
survive the last interval. Some rule for “adjusting” the observed p,;; 
must be formulated to cover the case where these conditions are not 
met. A rule which in some sense at least, is an extension of Berkson’s 
*[1953] rule for the case of one fixed response-time is as follows. Suppose 
in the sequence p,;; , --* , ~s,4,+1 there are one to more uninterrupted 
subsequences of zeros. Each such subsequence, except possibly the 
first, has a non-zero number immediately to its left and each sub- 
sequence, except possibly the last, has a non-zero number immediately 
to its right. We subtract the quantity 4 from-each such “bounding” 
number if it occurs only to the immediate left or only to the immediate 
right of a zero subsequence. Then, in this case, a given zero subsequence 
is bounded immediately to the left and/or to the right by numbers 
which have been reduced by 4. The sum of these reductions is then 
divided equally among the zeros of the subsequence. Thus, if we have: 
1, 0, 0, 2, we get, after adjustment: 4, 4, 4, 14. If we have: 1, 0, 0, 0, 
we get: 4, 2, 4. 

On the other hand, if a bounding number is both a right bounder 
and a left bounder, then we subtract from it 4 to be added to the zeros 
on its left and $ to be added to the zeros on its right. Thus, if we have: 
1, 0, 0, 2, O we get: 4, x, v4, 14, 4. The quantities 7, are the result of 
dividing 4 + 4 = § equally among the two zeros. 

As a somewhat more realistic example, consider k; + 1 = 7 and 
Pir ys °°* » Pies = 0, 1, 2, 0, 0, 25, 0. After adjusting the first sub- 
sequence we get: 4, 3, 2, 0, 0, 25, 0 since the 1 bounding the first sub- 
sequence is a right bounder only. After adjusting the second subse- 
quence, we get: 4, 4, 14, 1s, 4s, 243, 0. After adjusting the third 
(last) subsequence, we get: 4, 4, 14, ry, vs, 244, 4. The sum of these 
(= n,) is 28, as it was originally. The reader may verify that 
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Pir» °°? y Pix, = 2, 0, 1, 1, 0, 8, 10, 7, O becomes, after adjustment, 
1}, I, 1, 72, 10, 64, be 
In the case where pi: , Pi = 0, 0, 0, 0, 25, say all 


individuals at dose 7 “survived”’), it does not seem proper to apply 
this rule in general (which in this case would give: }, 4, 4, 4, 243), for 
we would have no information from the data as to how much time it 
would actually have taken for dose 7 to cause a response. However, 
if some “lower” dose than the i-th did exhibit some response within the 
time 7,,, , then we might infer that the erratic result for dose 7 was 
simply due to experimental variation and go ahead and apply the rule 
to dose 7. Otherwise, we would have to discard the dose 7 data and 
assume dose 7 is “‘sub-lethal’”’ for the time allowed it. 

The rule is then extended in the case 7, > Ta, > *** > Take 
(which, as stated earlier, is good experimental practice anyway, where 
we assume dose 1 < dose 2 < --- < dose m). Starting with dose 1, 
discard all doses which exhibited no response until a dose is reached 
which did exhibit a response. Then proceed to apply the stated rule 
to that dose and the remaining ones. 

One might also consider, in some cases, grouping given time intervals 
for a dose so as to remove the zeros. Doing this after seeing the data 
certainly biases the results, but a little of this sort of juggling should 
not be harmful and is, in fact, the usual procedure in chi-square analysis. 
For example, if the lowest of the doses which exhibited some response 
had no response in the first of its time intervals, it appears more reason- 
able to pool its first time interval with the second rather than to assign 
some response to the first interval by an arbitrary adjustment rule. 
Strictly, the adjustment rule should not be applied to remove a zero 
in an interval unless some response was exhibited at an earlier time 
with the same dose or at a lower dose for the same or an earlier time. 

Another possibility suggested by Berkson [1953] is to follow a maxi- 
mum likelihood procedure through one cycle, as outlined in the Appendix 
and use the estimates of the parameters to estimate the zero observations, 
which are then used in a minimum chi-square method. We hesitate to 
recommend this in the present case, because of the amount of extra 
work it entails. It can probably be defended only on the grounds that 
it is large-sample convergent, but then so is the rule we have advanced. 

The reader is referred to Anscombe [1956] for an argument in the 
fixed observation time case (procedure A). The safest thing that can 
be said is that the time intervals should be so pre-specified that no 
zeros occur. This would require some “‘pilot’”’ research, in general. An 
“optimum” pre-specification of the doses and time intervals would be 
such that the cell numbers within doses are equal. A “perfect” ex- 
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periment can be designed after so much is known that there is little 
point in conducting the experiment. , 


TABLE 1 


Deatus Mice AFrerR INTRAPERITONEAL 
INJECTION OF CERIUM CITRATE COMPLEX 


Doses (mg/dg) 
Death 
time (hrs.) 50 100 150 200 250 
0-3 0 1 6 18 
33-12 0 1 12 8 5 
123-48 0 0 0 3 2 
483-168 1 3 2 1 0 
Survivors 29 25 10 ll 5 
No. exposed 30 30 30 30 30 


6. EXAMPLE 


In Table 1 we present data on toxicity to mice of a cerium citrate 
solution. These data, with a somewhat different time breakdown, 
have been previously reported by Graca et al. [1957]. The time intervals 

“were chosen to be the same at every dose but this is not necessary in 
the general theory. There were 30 mice exposed to each dose, so that 
Mm, = --+ = ns = 30, and m = 5 (the number of doses), k; = 4 (the 
number of response-times per dose). 

In Table 2, computations of the present procedure are illustrated 
for the third dose (§ = 150). The p,;; are adjusted in the second 
column, as outlined in the preceding section, and the r;; in the third 
column are cumulates of the adjusted p;; . Columns 4, 5, and 7 are 
self-explanatory. The y,; in column 6 are the natural logarithms of the 
elements of column 5. There is considerable rounding error in the 
various computations because the final results were actually computed 
with an IBM 650 and not with a desk calculator. The elements of 
column 8 are computed from those of column 2. Thus, .2536 = 1/6 + 
1/11.5, 1.0870 = 1/11.5 + 1/1, etc. Columns 9 and 10 are computed 
from columns 2, 7, and 8 as outlined in Section 4. Thus, 5.8435 = 
(4.8000)? (.2536), 57.7917 = (7.2917)* (1.0870), ete. And, —3.0435 = 
— (4.8000) (7.2917)/(11.5), —51.7101 = —(7.2917) (7.0917)/(1), ete. 
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From columns 9 and 10, we get 


5.8435 — 3.0485 0 0 | 
W, = —3.0435 57.7917 —51.710C1 0 
0 —51.7101 83.8196 —31.5185 
L O 0 —31.5185 34.0741. 
With W; and column 6, we compute, by matrix multiplication, ‘¢ 
| T—1.3863] [—9.1250] 
| _ Ww, 3365 | _ | — .9169 
A754 .6019 


The sum of these (= S°) and the sum of the elements of W,(= S3) are 

entered in the appropriate places of Table 3. The above process is 

‘followed for the other four doses, and the first three rows of Table 3 

are the result. Finally, we compute S}, , --- , S°, as the sums of cross- 
- products of y,; and 9;; . We then get 


5 
total chi-square = )* Si, = 75.54. 


t=1 


TABLE 3 
Summary Computations For ALut Doses 
Quantities Ist 2nd 3rd 4th 5th 
computed dose dose dose dose dose 
fs 50 100 150 200 250 

So .9670 4.2064 8.9847 9.1800 7.7583 
S; —3.7213 —7.1608 — .8070 —1.2798 5.3970 
t —.1541 — .7635 —5.5522 —5.3933 —7.2615 
tt .2347 1.4608 10.7489 8.7236 12.0209 
Sye .8656 3.0351 9.5609 9.8517 1.7339 
a —3.1219 —1.4641 .6754 .8230 1.9113 
B; 1.6390 1.3124 1.2384 1.6381 1.2988 
Zs 3.9120 4.6052 5.0106 5.2983 5.5215 


The computations have now gone as far as is possible without 
: specifying metameters. Now (or at any time after the logits y,; are 
computed) it is possible to “discover” a time metameter. A simple 


; 
| 
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plotting of the four y;; values of column 6 of Table 2 as ordinates against. 
abscissae t;, = 3, tig = 12, ti; = 48, tig = 168 shows a decided down- 
ward curvature. For this metameter, the \ of equation (11) is 1. Hence 
we know that » < 1. We then try t;; = log.7;,; (i.e, \ = 0) and find 
the downward curvature not entirely corrected, so that apparently 


Witht;; = —1/7;; = —1), we find an upward curvature 
so that apparently < < 0. Finally with ¢ = (ae., 
\ = —}) we achieve linearity, essentially. This metameter is checked 


with all five doses and appears to be appropriate. Its values are then 
entered in column 11 of Table 2. (In this example, this column is the 
same for all five doses, since the time breakdown is the same.) 

We then compute 


yy | 8774] _ | 14.9259), 
ys — .2887 10.5220 
1543) L 3.8418 


From these, for all five doses, the fourth, fifth, and sixth rows of Table 
3 are computed as equations (16). Equations (17) give the next two 
rows of Table 3, to complete the fitting of the within-doses model 
[equation (10)]. Equation (18) yields the value 65.05. We thus have 
the chi-square analysis: 


Source df. Chi-square 
(a; , By) 10 65.05 
Deviations within doses 10 10.49 
Total 20 75.54 


The value 10.49 as a chi-square with 10 degrees of freedom indicates 
an excellent choice of the time metameter. We are now prepared to 
“discover” the dose metameter. There was prior evidence to suggest 
log dose as the appropriate dose metameter, and this was borne out 
by plotting the &; of Table 3 against log 150, --- , log 250, showing good 
linearity. The natural logarithms of the doses are therefore entered 
as xz; values in Table 3. More generally, one would plot the &, against 
various 2, , varying \ as 


= log, é, 1=0 


with the same considerations of direction of curvature and values of \ 


(22) 
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as in the plotting of y,; against ¢, . [Incidentally, in the computing of 
logits or metameters, common instead of natural logarithms could 
be used, but our formulations are simpler when natural logarithms are 
used. Furthermore, in the definitions of metameters for \ ~ 0 in 
equations (11) and (22), it is not necessary to divide by \, but doing so 
introduces certain simplicities and insures that the metameters are 
increasing functions of the time and dose]. 
From ‘Table 3 and equations (19), we get the four equations: 


(i) 31.096a + 159.6498 — 19.125y — 100.6086 = 7.122 
(ii) 159.6492 + 823.8078 — 100.6087 — 530.7256 = —26.799 
(iil) — 19.125a — 100.6086 + 33.1897 + 174.0976 = 25.047 
(iv) —100.608a — 530.7258 + 174.0977 + 158.0366 = 127.041. 


The coefficients of these equations are very easily computed on a desk 
calculator with two sets of accumulating dials, as sums and sums of 
cross-products with the x; . The solution of these is approximately 


& = —15.825, 6=3.197, 74% = .1026, § = .2337, 
and from equation (20) we get 


(a, 8B, y, 6) chi-square = 59.29, 


(23) 


so that 
deviations chi-square = 75.54 — 59.29 = 16.25, 


with 16 degrees of freedom, indicating a good choice of the dose and 
time metameters. If this chi-square had been significantly large, we 
would suspect the dose metameter, if not heterogeneity, since the 
previous analysis confirmed the time metameter. 

We have so far fitted equation (7) to the data, but there is at least 
a logical advantage to estimation with the additive model [equation 
(8)], provided additivity can be assumed. This assumption can be 
tested. 

The first three of equations (23) with the 6 terms deleted have the 
approximate solution 


a = —15.368, B = 3.106, 7 = 1.316 
snd (21) gives 
(w, 8, y) chi-square 59.18, 
so that 
non-additivity chi-square == 50.20 — 59.18 = 


| 
i | 
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The final analysis of the dose-time models is then: 


__Bouree Chirequare 

(a, B, 7) 

1 Al 

Deviations 16 16.25 
Total 20 75.54 


The value .11 as a chi-square with one degree of freedom is an indication 
that additivity of dose and time may be assumed. |The same conclusion 
would be reached with an F test where F = (16)(.11)/(16.25) with one 
and 16 degrees of freedom.] 

The estimated response-dose-time (p, , 7) relation corresponding 
to equation (1) is then 


log. = —15.368 + 3.106 log, + 1.316(-2/7"). (24) 


Now, suppose we wish to estimate that dose which kills 509% of 
the mice in 168 hours. With p = 4 and7 = 168, log, [p/(1 — p)] = 0 
and —2/r'” = —.1543. Inserting these into equation (24) gives 
log.€£ = 5.0132 and £ = 150.4. To get the estimated variance of this 
from formulae given in the Appendix we need the inverse of the co- 
efficient matrix which, in this case, is the first three rows and columns 
of the coefficient matrix of equations (23). Its inverse, with consider- 
able rounding error, is 


6.634 —1.300 —.1183 | 
= | —1.300 2567 02903 |, 
02903-04995. 


so that the estimated variance of log,é, is 


__ 1, 5.0132, — .1543]A~'| 5.0132 = 


| 
OO4495. 
J 


— 1543 


Since the \ of the dose metameter = 0, we get the estimated variance of £ 
as (150.4)?(.004495) = 101.7 so that the standard error is ¥/101.7 = 
10.1. Hence the estimated LD; (168) is 150.4 + 10.1. 

Graca et al. estimated this as 146.6 + 9.7. ‘The agreement is ex- 
cellent in view of the fact that their analysis differed from the present 
one in that they followed procedure A of Section 1 (using a 168 hour 
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response time), their response transform was the probit instead of the 
logit, and they used an iterative maximum likelihood estimation instead 
of minimizing the modified chi-square. We have probably increased 
the estimated variance slightly by “dubbing” in the zeros of the first 
dose. The chief advantage of the present procedure is that, in yielding 
equation (24), it gives a superior description of the response-dose-time 
relation. We could even, with some danger, extrapolate beyond 168 _ 
hours and are certainly enabled to make a reasonable estimate of an | 
LD ,oo,(r) for any 7 in the range 3 < +r < 168. 


APPENDIX 
Estimation by maximum likelihood 
We replace equation (3) by the form 


pé, r) = g(Y) (25) 


where Y = Y(é, 7, 6,, --- , 0,) isa function of &, 7, and the s parameters 
Let 


Dis = PEs, ,k. 


The probability that a random individual from the population will 
respond to dose { within time 7,, is equal to the probability that it is 
contained in that segment of the population which responds to é, in 
time 7,;, , that is, p,, . Similarly, the probability that this individual 
will respond within the second time interval is pj. — pi , etc., and the 
probability of its surviving is 1 — py, . Hence, if p;; are the observed 
numbers of responses as in Section 4, the likelihood of the observations 
for dose &; is 


Ti 


where = Pir, = Dis — 2 = 1 — - 
The log of the likelihood for all doses is 


L = > (pi log Tat + Pi log 


from which, noting that Ep;; = n,x,; , we find 
m ki 
06, fol 


s=1 jf 


f. 
& i; 
| i 
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and 
OL 


m ki-l 


h,h’ = ,8, (27) 
where [with z;; = g’(Y;;)] 


ci, = , Wi; 


The maximum likelihood estimates are obtained by solving 0L/00, = 
= 0L/00, = for 6, , --- , This may be done iteratively by 
noting that if 6? , --- , 6? are estimates of non-zero efficiency, then to 
the first order of asymptotically small — 


= aL 
00, + 2 + 


where (66?) = 0, — 6). We define the matrices 
C; = U,; = col fu; , 


A = col [(66{), , (56°)] 


and W; as in Section 4 but here using the w,; and v;, defined above. 


Setting 01/00, = --- = 0L/06, = 0 and making use of equations 
(26), (27), and (28) gives 


b cae, = ciw.u, (29) 


where the elements of C; , W; , and U, are evaluated at the parameter 
point @ = 6°. These equations may be solved for 6, , --- , 6, and the 
values used to replace 6) , --- , 6° in the next cycle, in the familiar 
iterative process. 


If we define the working observation (cf. working probit, Finney 
[1952]) 


= cio + ; (30) 
hel 
(29) becomes 
= (LW (31) 


where = col (6, , , and y; = col , , This sub- 
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stitution is particularly useful in the usual case that Y,; is linear in the 
6,. In that case = and (= Y?,) is analogous 
to the expected probit (Finney [1952]). For example, in the form of Y 
suggested by equation (6), we have (6, , --- , 6:) = (a, B, 7, 6) and 
Ch = 1, = Zs, Cis = = . 


Estimation by minimum modified chi-square 


The quantity 


m 

t=1 j=1 
is the familiar Pearson’s chi-square with k, + --- + k,, degrees of 
freedom. An asymptotically equivalent expression is obtained by 
substituting observed values in the denominators to give 


m 


t=1 j=1 


We may obtain an empirical observation y;; (cf. empirical probit 
Finney [1952]) by inverting the relation 


= 


[g.v. equation (25)] where r,, = pi: + --- + ;;. So, to the first order of 
asymptotically small quantities, 


Vii 
ny + g (yi i; Yii) 


Pii 


ta = + — ya); 


n; 


nN; 


where z;; = g’(y:;), the empirical value of g’(Y;;).. Making these sub- 
stitutions in (32) gives 


m ki ki-l 


where 


i 


4 
| 
| 
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1 
Wi; niet (+ + 
Pi; 


which may be defined as the empirical values of the w,; and v;; of the 
maximum likelihood procedure. The quadratic form (33) is asymptoti- 
cally distributed as chi-square with k,; + --- + k,, degrees of freedom. 


The Y,; are functions of 6, , --- , @,. All other quantities in (33) are 
empirically determined. 


In the usual case where Y’,;; is linear in 6, , --- , 6, (te. Yi; = 
wr c;,9,), it is easily shown that values of 6, , --- , 0, which minimize 
(33), and are hence the minimum modified chi-square estimates, are 
given by 

(34) 


where @ and C, have the same definitions as in (40) as do W, and y; 
with the exception that empirical instead of expected values are used. 
In the further special case that the logit transform is used, we have 
= gsi) = 1/0 +e") 
so that 
ys; = log, [ri;/™: — 
= — ri , 
and w,; ‘and v;; are as defined in Section 4. The estimating equations 


given for the various models follow as special cases of the above de- 
velopment. 


Estimates of variability 


Suppose a and b are consistent estimates of their “true” values and 
r = a/b. Then expanding around the true values of a and b we get, 
to the first order of asymptotically small quantities. 


Ar = (Aa — rAb)/b. 
So, asymptotically, 


Var (r) = [Var (a) + 7° Var (b) — 2r Cov (a, b)]/b’. (35) 
Let 
a=at b= by + 
v,, = Cov (6, , 6), 
where the 6; are consistent estimates of the s parameters 6, +++ , 6 , 
and the ay, «++ ,@, and by , «++ , b, are known constants. “Then 
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Var (a) = A’VA, Var(b) = B’VB, Cov(a, b) = A’VB = B’VA 
where 
V = p,;], A = col(a, , --- ,4,), B = col (b, , --- , b,). 
Inserting these into (35) gives, asymptotically, 
Var (r) = (A’ — rB’)V(A — 7B)/b’. (36) 


We may use this to derive asymptotic formulae for the variance of 
an EDyo,(r), estimated dose to cause a response ratio of 100p% in 
time 7, or of an ET’ ,oo,(€), the estimated time required to have a response 
ratio of 100p% with dose ¢. The asymptotic efficiency of the minimum 
modified chi-square or the maximum likelihood estimates gives the 
result that V may be taken to be the inverse of the coefficient matrix 
used to get the estimates. This matrix is of the form )>; C/W;C; and 
isa 3 X 3 matrix ora 4 X 4 matrix depending on whether the additive 
or the non-additive dose-time model is used. 

On the metametric scales, the following are then obtained directly 
from (36). 


Additive dose-time model: 
Var [EDyoo,(7)] = (37) 
Lt 
Var [ET = 51, z, t)V; (38) 
Lt 
Nen-additive dose-time model: 
1 
1 x 
Var [EDyw,(7)] @ + 80° (1, t, zt) V, (39) 
zh 
(1) 
Var = (1, x, t, rt) Vs (40) 
rl) 
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Equation (37) was used in the example of Section 6. 

The conversion of these to the original (non-metametric) scale has 
been illustrated for the ED,o,(7). Suppose the metameter wis in the 
class given by all real \ m equations (22). Then if .é is an-cstimate on 
the metametrie scale, 


Ste 
| 


if r»\ 0, 
if 0. 

So, to the first order of asymptotically small quantities, 
Var (£) = Var (4). 


Thus if we multiply a Var (£), given by one of the Var [/D,,,,(7)] 
formulae above, by £~** we get an approximate formula for the variance 
of the original scale. This assumes, of course, that the metametric 
transform is in the class of equations (22). Similar considerations apply 
to the formulae for Var [ET ,o0,(é)]. 

It might perhaps be better to derive confidence limits on the meta- 
metric scale and then convert these limits to the original scale. This 
would have the disadvantage of losing the symmetry of the limits on 
the original scale and it is doubtful that this would increase the accuracy 
of the limits. 

The variance formulae given assume an absence of helerogencity. 
Finney [1952] recommends that, in the presence of heterogeneity (as 
indicated by a significantly large deviations chi-square not attributable 
to inappropriate metameters), the variance formulae should be multi- 
plied by a heterogeneity factor given, in this case, by 


(deviations chi-square) 
(Uk: — 4) 


We cannot endorse this practice without some misgivings since the 
effect of heterogeneity is additive, not multiplicative. If heterogeneity 
were so large as to have an essentially “multiplicative effect,” it would 
be better (and simpler) to apply straight least squares methods to the 
various models rather than the minimum modified chi-square methods. 

It must be pointed out that a variance estimate derived from a single 
assay does not necessarily bear any relation to the variability over 
repetitions of the assay. This problem is common to all experiments. 
If a single quantity is estimated in a number of experiments, the vari- 
ability of the different estimates is equal to the sum of an “intra- 
experiment” component and an “inter-experiment” component. The 
latter component is not estimable in a single experiment. In the 
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words of Dews and Berkson [1954], --- the only way to a certain the 
standard error of repeated assays is actually to perform repeated 
assays ” 
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INTERACTION OF GENOTYPE AND ENVIRONMENT 
IN CONTINUOUS VARIATION: II. ANALYSIS 


R. Morey Jones ano Marner 


Unit of Biometrical Genetics, Agricultural Research Council 
Department of Genelics, University of Birmingham 
Birmingham, England 


In an exriier paper [Mather and Morley Jones 1958] we have shown 
how the phenotypic effects of interaction between genotype and en- 
vironment can be represented by quantities denoted by g, orthogonal 
to the genetic and environmental parameters, and how the contribution 
of these interactions to phenotypic variation can be expressed in terms 
of quantities denoted by G, which derive essentially from the variances 
of the g’s. These G’s appear separately frorn the genetic items, D and 
H, and so may be regarded as distinct components of variation. The 
EH component, however, explicitly depends in part on the g’s, as well 
as on the e parameters measuring effects of the environment, so that a 
certain confounding of the quantities appears in the expressions de- 
scribing the variation. 

These expressions, which are set out in Table 1 (a), show us how 
genotype-environment interactions influence the variances and ¢o- 
variances by which variation is measured. We must now examine the 
requirements for successful separation of the effects of genotype- 
environment interaction from the other components of variation, and 
for the estimation of these components by comparisons and analyses 
of the variances and covariances which are obtainable in experiments. 

Now, where all the genotypes are distinguishable and the pheno- 
types associated with them measurable in all the environments, complete 
analysis is simple even if it might sometimes be a little tedious. Every 
d, h, e, and g can then be estimated and they can be combined into 
whatever composite measures may be desired. With genes of major 
effect, like the familiar mutants of mendelian genetics, and with gross 
or controllable differences in environment, such as those between 
seasons or between artificially maintained temperatures, complete 
ascertainment, measurement, and analysis ean be achieved. They 
can even be achieved in respect of certain broad, but important, com- 
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TABLE 1 


iv Terms or Gp, Gy, AND E, AND (b) THerr TRANSFORMS 


(For definitions of the symbols denoting the variances, 
see Mather and Vines [1952] Table 6.) 
Variance (a) (b) 
pe) EwtV 7 
Ver Ew 6 
Vire +3Gp+iGut Ea +36 
Virs 
Viva aH 3D+ 
Vora tD+ 33H {D+ 
Voss 1D+ 6H {D+ 436 
iVso, 
VV 


Note that all the Z’s contain some ga , being of the general form Ejz) = VetzE0,) - In form (a) 
the parental and backeross variances contain an additional term due to their summation in pairs. 


parisons where polygenically determined continuous variation is in- 
volved. Thus a pair of true-breeding lines and their F, can be raised 
each in the same range of environments so distinguished from one 
another that differences between them are large in their effects when 
compared with variation within them. Data from such an experiment 
are given by Jinks and Mather [1955] who grew two inbred lines of 
Nicotiana rustica and their reciprocal F,’s over a number of seasons in 
each of two parts of the country. A composite d and h can then be 
estimated measuring respectively the effect of the compound. genetic 
differences between the homozygous lines and the combined properties 
in dominance of the genes involved, as can e’s measuring the overall 
effects of differences in season and site, and g’s measuring the interaction 
of the compound genetic and environmental differences. Generally, 
however, the situation with continuous variation will be more complex. 


Random distribution over environments 


With continuous variation, mediated by polygenic systems, 
individual genes cannot be followed in inheritance. Statements of 
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genetical constitution must be based on knowledge of ancestry, fortified 
by such observations or assumptions as can be made about the operation 
of linkage and selection. These statements are therefore of a statistical 
kind, sufficient to permit representation of the effects of segregation in 
terms of the components of variation, but the genotype of individuals 
can in general be neither recognised nor inferred. Nor can all or even 
most of the differences in environment which materially affect the pheno- 
type be recognised and isolated. We need, however, do no more than 
recognise that where families of individuals are grown over correspond- 
ing ranges of environments, the variation they show will contain, for 
corresponding genotypes, the same components of variation in respect 
of environmental effects and genotype-environment interactions. 

Where the families are grown over a comparable range of environ- 
ments, and are large enough to render negligible the sampling variation 
in respect of genotype (if segregation is occurring), and of environments 
(whether segregation is occurring or not), the differences between their 
mean phenotypes will reflect only their differences in average genetical 
constitution, as defined in the earlier paper. Thus if we compare the 
means of a set of large F; families grown over comparable environments, 
the expected variance, Vir; , of these means will be 3D + 3H. To 
this will be added any variation arising from overall differences in the 
environments experienced by the families, such as arise when, as is 
commonly the case, they are grown in different plots in the field or 
grown in different culture containers in the laboratory. This situation 
will be treated in more detail below: it is sufficient now to note that 
environmental effects within the families, and the interactions arising 
from thei, will appear only in so far as they contribute to sampling 
variation. Where we can legitimately average over environments, 
therefore, only the genetic components appear explicitly in the variation 
of family means. 

It is worth digressing a moment, however, to note that although 
in the absence of genotype-environment interactions these genetic 
components are unconditionally definable, they themselves will not 
be free from influence by the environment if interactions are present. 
Indeed they cannot be, for, as we saw in the earlier paper, they measure 
the average effects of the gene substitutions and thus must change with 
change in the range of environments where interactions are operative. 
The same point. emerges explicitly when we Jook at the situation the 
other way round, from the point of view of the environment:al com- 
ponent of variation. Using the parameters as we have defined them, 
the variation of environmental means depends explicitly on the g, 
parameter as well as the e’s. For any particular family, the g, could be 
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removed by assimilation to e, but would reappear in families of different 
genetical constitution. In other words, where genotype-environment 
interactions are operative, the measures of genetic effects apply only 
to the range of environments sampled, and the measures of environ- 
mental effects apply only to the range of genotypes sampled; though, 
of course, the measure of the interaction components of variation not 
merely tell us how variable are the genetic effects over the range of 
environments in question, and vice versa, but also serve to show us, at 
least up to a point, whether the genetic effects are sensitive in general 
to environmental changes which may extend beyond the range tested, 
and vice versa. 

Returning to our main theme, although the variation between 
means of large families will reflect only the genetic parameters as defined, 
variation between individuals within families will reflect the differences 
amongst the environments of the individuals and also the interactions 
which spring from these differences. Variances of genetically uniform 
families, parents and F, , will of course include no genetic components; 
they will reflect only the main effects of environmental differences 
together with the interactions of these differences with the particular 
genotype of each fainily. The mean variance of the two true breeding 
parents will be y = V, + V;,, , e and g, being defined as in Table 4 
of the earlier paper, and )-g, indicating summation over all gene 
differences between the parents in each environment taking sign into 
account. The size of this term V;z,, will obviously vary according to 
the distribution of the gene differences between the two parental lines. 
The difference between the two parental variances is the covariance of 
eand > gz. , and is of little further interest to us in the present connection. 
It will be seen too from Table 4 of the earlier paper that the variance of 
the F, family will be 6 = Veesse,)- 

Now provided all families are distributed at random over the 
environment we can use y and 6 to correct the variance of F, for en- 
vironmental and interactive variation. We then find 


where a = G, — Vz,, and B = Gy — Vsz,,. Indeed all the variances 
of segregating generations can be written in terms of D, H, a, 8,7, and 6 
as in Table 1, the coefficients of y and 6 being in all cases the proportions 
of homozygotes and heterozygotes expected in the family in question. 
Since the structures of the variances are expressible in this way, it is 
clear that Gp can never be disentangled from V;,, nor Gy, from V3,, . 
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Then, although variances within families grown over the same range of 
environments allow D and H to be estimated separately from the 
effects of environment and interaction, the G’s cannot be estimated. 
The values found for a and 8 will reflect not merely the set of genes 
segregating within the families, but also the distribution of the genes 
between the parent lines, even where linkage is not complicating the 
issue. Since Gp = >, V,, we can in fact writea = >> V,, — V3,, = 
— > Woee.0as) » SUMMation proceeding over all pairs A — a and B — b 
of genes. With alleles giving interactions of the same sign associated 
in a parent, the covariance will be positive, and with genes whose 
interactions are of opposite sign together in a parent, the covariance 
will be negative. Where extreme parents are crossed, and, as might 
often be expected, the alleles associated in each parent tending to 
give g.’s of the same sign in a given environment, the contributions to 
W would all be positive and a would be negative; but the effects of 
interaction might tend to balance each other either because of variation 
in sign of the g,’s or because of departure from unidirectional distri- 
bution between the parents. 

Three further points remain to be made about the analysis of such 
data. The variances of family means shown in Table 1 assume that the 
families are large enough for sampling variation to be negligible. Where 
this is not the case, they will be inflated by an appropriate item for 
sampling variation, and to this extent will depend on the non-heritable 
and interactive components as well as on D and H. Secondly, covari- 
ances, such as that between F, parents and means of F; progeny, wiil 
not in general be capable of use in analysing variation where genotype- 
environment interactions are present, as the parents may be raised 
at a different time or in different environments from their progeny. 
Even though G’s do not appear in the expression for the covariances, 
they will have their effects on the covariances in so far as the values 
assignable to d and h for any gene will change over environment where 
the different families are separated in time and space. Only if parents 
and progeny can be raised and measured side by side in the same range 
of environments, as may be possible for example with asexually propa- 
gable plants, could covariances legitimately be included in the group 
of data for analysis. 

Finally, it will be seen from Table 1(a) that the Z components can 
be written in the form 


and therefore involve three constants. It might appear, therefore, that 
if enough different types of segregating generations were available, 
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these constants, together with D, I, Gp , Gy, and Vy,, could be esti- 
mated. But since, as we have shown, the six quantities depending on 
environmental effects and gene-environment interaction are always 
involved as linear combinations of the four quantities a, 8B, y, and 6 
[Table 1(b)], the estimation of the G’s themselves would be possible 
only if we were prepared to make further assumptions about the relations 
among the six quantities. 


Non-random distribution over environments 


We have been considering the analysis of results obtained where the 
individuals of all families are distributed at random over a common 
range of environments. In most experiments, however, this has not 
been the case, and frequently indeed experimental convenience may 
rule it out. It is desirable therefore to extend consideration to the 
situation where each family is raised in one or more relatively compact 
groups, which are themselves distributed over a wider range of en- 
vironments. Thus with plant experiments involving F; families, each 
of them is commonly grown in its own individual plot, the various plots 
being distributed at random over the block of ground which also includes 
all other families. The environmental differences affecting the variation 
of individuals within the plot are then different from those affecting 
the variation in plot and family means. This is recognised in the type 
of analysis described by Mather [1949] by the use of EH, and EF, to 
represent the two different environmental components of variation. 
Environmental variation within plots, measured by E, , is expected to 
be smaller than that between individuals in different plots, so that the 
environments of individuals grown within a plot are to this extent 
correlated. 

Let us assume that the contribution made by environment to the 
phenotype of an individual is broken down into two parts, the one 
being the deviation from the plot means (,e) caused by environmental 
differences within the plot, and the other the deviation from the gross 
mean (,e) caused by environmental differences between plots. Let us 
assume further that the variance of ,e is the same for all values of ,e 
apart from sampling variation. We will assume, too, that the same is . 
true of the interaction with genotype to which the different environ- 
ments give rise. 

Considering a pair of alleles A and a, the three genotypes which 
they generate will give rise to phenotypes 


AArmtdtiet et ige t+ 290 
Aa:m+ht+ t+ 2 + ign + 2% 


aa:m— d+ 2 — 194 — 19h 
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where m is the mid-parent. Now with /; families each grown in a sepa- 
rate plot, those derived from an AA plant in F, will have mean m + 
d + 2¢ + 294, those from an Aa parent will have mean m + 3h + .e + 
329, , and those from an aa parent will have mean m — d + .€ — og. 
It is then not difficult to show that the variance of F; means is 
Virs = + + + + 

provided the family is large enough to make sampling variation 
negligible. 

Turning to V2r3 , the mean variance within families, we note that 
for wholly AA families this will be V;,..,,,) and for aa families it will be 
Vise-.0a)- For segregating families from Aa parents, we can split the 
within family variance into components within and between genotypes. 
Within those quarters of the families comprising plants of genotype AA 
and aa it will be V.,.4,,,, and V,,.-,,,) respectively, and within the 
halves of genotype Aa, V,,.+,.,,. The variance between genotypes 
will not contain any ,¢, gz, Or 1g, , since they will be averaged out. Nor 
will it contain any ,¢ since it is taken_round the plot means, but it will 
contain .g, and .g, since the effects of the gene substitution will vary 
with the plot in which it is being measured. In fact, the variance 
within segregating families will be 


[2 V + + BV + [3(d? + + + 


Then, combining the variances of the three types of family in their 
expected proportions of occurrence, }AA: jaa: }Aa, we find 


Vors = FAV + + 
+ + HE + Vi + V..)) 
which added to Vir; gives for the total variance of the F; generation 
Vies + Vors = 2° + + + + Vien + Vien) 


I] 


as would be expected. 

Direct calculation of the variances becomes tedious when more than 
one gene difference is admitted. All the results may, however, be simply 
derived from the known formulae for the variances and generation means 
in a constant environment. The method may be exemplified by obtain- 
ing the generalisation of the above results for F; to any number of 
unlinked genes. We observe first that the total F, variance may be 
resolved into the mean variance for a fixed environment of the range of 
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environments in question together with the variance of the generation 
mean over this range. hus, putting ,@p for >> V,,, and so on, we 
ean write down at once 


Virs + Vors + iGp + Gp) + + iGy + Gn) 


V (0432.00) + V ’ 

since ,e and the ,g,-., are uncorrelated with ,e and the 29,,.,. 

Next, for the variance of family means, which we have assumed also 
to be plot means, we have 

Virs = variance of family (plot) means 
mean variance of family means in plots of similar environ- 
ment + variance of generation mean over plots 

= 3(D + + + 2Gu) + 

Finally, V2r3 is obtained by subtraction. 


TABLE 2 
CoNSTITUTION OF VARIANCES IN Parent, F; , SELFED (F'), AND Srp-MaTED (S) 
GENERATIONS WHERE INbIVipUAL Fami.ies IN F; , F's , AND S; ARE GROWN 
Eacu 1n Its Own Pitot, So Tuat THE Errects oF ENVIRONMENTAL 
DirFERENCES Wituin (Surrix 1) AND BETWEEN (Surrrx 2) PLots 
Must Be DistiNncuiIsHep. FURTHER DETAILS IN THE TEXT, 


Variance Within plots Between plot means 
Vri Ve +2104) £r ¢+Za0a) 
Vp2 08) 
Vri Vo Ve 
+ V on) 


oat Vi, e+1/22, 


Virs +32Gp + 

ri, #+1/42,0n) 
Vors + Gp + + = 

§2Gn + Vo on) 

Virs 3D+ 

Va e+1/8E20n) 
Vara §D+ + + + = 

ve:Gu 

Viss — (D+ Vell + 

Vi 
Voss tD+ eH — 
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The three variances to be found in the /’, generation can be obtained 
by essentially similar means. Turthermore, where a generation such 
as F does not fall into more than one type of family, its variance can 
still be measured between as well as within plots, if the common practice 
of growing several plots be followed. Then, recalling that ,G> = 
> V,,. and so on, the results set out in Table 2 are obtained for genera- 
tions up to F,. 

It will be seen from the table that, in general, all the variation 
within plots appears in the variance of highest rank, which has the same 
coefficient for ,Gp as for D and for ,G,, as for H. The variation between 
plots attributable to environment and its interactions appears in the 
variance of next highest rank, and all the lower rank variances contain 
only D and H. : 

In principle the analysis of such an experiment can yield estimates 
of all the components of variation except the environmental and inter- 
active components within plots. In respect of these only the counter- 
parts of a, 8, y, and 6 can be estimated, and these will appear in the 
different variances with the coefficients given in Table 1. This type of 
experiment can therefore yield more information than the simpler type 
in which individuals are distributed entirely at random over the whole 
range of environments, with no grouping of families. The design and 
analysis of experiments for separating and measuring genotype-environ- 
ment interaction are, however, obviously matters meriting further 
investigation. 


Summary 


The problems are discussed of analysing continuous variation into 
its components where genotype-environment interactions are present. 
With all individuals of all generations distributed at random over the 
same range of environments, D and H may be isolated but the inter- 
action and environmental components cannot be isolated as such. 
They appear in the form of certain transforms, viz. a = Gp — Vz», , 
B= Gy — = Ve + Vz,,, and 6 = Where, in genera- 
tions such as F; , the different families are each raised in a separate plot 
or group the effects of environmental differences may be separated into 
those within plots and those between plot means. The interaction 
components of variation between plots may then be isolated, but 
analysis of variation within plots is again possible only in terms of the 
transforms. 
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EXPERIMENTS WITH TWO TREATMENTS PER 
EXPERIMENTAL UNIT IN THE PRESENCE 
OF AN INDIVIDUAL COVARIATE* 


P. Cox 
National Institue for Research in Dairying 
University of Reading 
Reading, England 
INFRODUCTION 


Temporal or spatial trendS-im experimental material can sometimes 
be largely eliminated from treatment.cemparisons by the stratification 
involved in, for example, Latin-square designs. As an alternative, 
covariance analysis on the trend variable is available and a recent 
example on this has been given. by Federer and Schlottfeldt [1954] and 
further discussed by Outhwaite and Rutherford [1955]. Both of these 
methods eliminate the mean curvature over a set of experimental units 
or of replicates and while this may suffice in, for example, field experi- 
ments with neighbouring experimental units, it is less likely to be 
adequate when the latter are selected at random from some large 
population, or when treatments are compared ‘‘within” a biological 
individual, for example, within a lactating cow, as shown by Cox [1956] 
or a tree—Rasmussen [1956]. The switchback trial with two treatments 
per block, for example, Lucas [1956}, allows for individual linear trends 
by confounding treatment comparisons with the second-order curvatures. 
These, however, cannot always be safely neglected. 

An analysis eliminating individual curvatures of order up to one less 
than the number of periods, using sets of orthogonal squares, has been 
given by Patterson [1951] and this design also allows after-effects to 
be estimated. By the use of intra-period observations, the author 
[1956: 3, 7] has shown that individual curvatures of order limited only 
by the number of independent observations per individual may be 
removed to isolate the treatment comparisons in Latin-square designs. 
This particular design restriction has now been relaxed and a general 
solution giving within-individual treatment comparisons free from 
general curvature effects, has been obtained. Population treatment 
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effects sre examined by combination over individuals and the intro- 
duction of restrictions, such as balanced treatment sequences, gives 
particular cases with correspondingly simplified analyses. The case 
where there are two treatments per individual or block will be treated 
here because it is a flexible and commonly used arrangement. 

It may be noted that, when there are more than two treatments, 
each appearing not more than once in any block, and linear covariance 
regressions are characteristic of the individual blocks, the analysis by 
Zelen [1957] is appropriate. 

For convenience the following exposition deals directly with the 
removal of time trends in the experimental material using equally 
spaced observations, this being a common practical case. It will be 
clear from the analysis that, mutatis mutandis, the solution is applicable 
to the more general problem of allowing for any individually character- 
istic covariate with unequally spaced observations. 


THE DESIGN AND MATHEMATICAL MODEL 


A complete experiment will consist of a number of individual 
experimental] units ideally taken at random from the population and 
circumstances to which any findings are to be inferred. The time for 
each individual is divided into three periods containing numbers ¢, , s, 
and ¢, equal and successive intervals. Two treatments, of which the 
main effects are to be compared, are administered, one during each of 
the first and last periods of an individual; one independent observation 
being obtained in each interval. Observations during the intervening 
period of s intervals are not required and in many cases it will be possible 
to take s = 0; this period may be generally useful, however, particularly 
in biological applications, to build up the administration of the second 
treatment so that the full effect obtains in the last period. 

lor the complete experiment a mixed model is used with fixed 
treatment effects, random individual and interaction effects, as: 


= Ke + BruPrui + 06; + + 


where, 


(i= 1, +2-+ 1, + 24-2, +s+ t) is the 
value observed in the 7th experimental interval for the kth 
individual, the jth treatment being applied in that interval; 

x, is the mean for the kth individual; 2 

By is the uth covariate parameter for the kth individual; 

P.,, is the value in the 7th interval of the uth general orthogonal 
polynomial [IXendall, 1946; Cox, 1957]; 
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, (j = 1, 2) is the jth treatment parameter; 

6. is the interaction between the jth treatment. and the Ath indi- 
vidual, and 

é is the term for the unexplained variation which, as usual, is 

assumed to be distributed about’ a zero mean with constant 

variance. 


ANALYSIS 


In appucation to an individual the treatment and interaction effects 
are confounded and it is temporarily convenient to drop the suffix k 
and write ¢; = 6; + 6;. The least-squares normal equations are then 
easily obtained as: 

(t; + + td: + td. = K 


Bu + — = B 


+ + tid, 
tox — JuBu + tebe 


T, 


where, 
K is the total of the observations for the individual unit, 


T, and T, are the observation totals for the first and second treat- 
ments respectively and 
ty 
= > P.;. 
i=1 
These equations are not independent but suffice to isolate the com- 
parison (¢, — ¢2) in the solution: 


This can be written in an alternative form using properties of the 
orthogonal polynomial coefficients. For let xz; be the value of the 
covariable in the ith interval, let S, = >>; x , and let 


So Sy 
= Ss Baath 


Then, cf. Kendall [1946]—observing the slight difference in the definition 
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of AW’ adopted for subsequent convenience 


where A‘” is the same as A™ except that the elements in the last row 
are 


= (w = 0,1, --- ,w 


and A“ is similarly derived from A“ by replacing the elements in its 
last row by Sy. = , (w = 0,1, , u). 
Equation (1) above now becomes 
(u)? 


tt A 


(2) 
E — oT, ] 


Further, since 


he _ So?” 
ttt So 
and, similarly, 


the bracketed terms on the left and right side of equation (2) are the 
Schweinsian expansions [Aitken, 1939] of A™/A™ and A{?/A™, 
respectively, where, 


(ta) (ti) (ti) (ts) 


(ta) 
0 


* 


gin 
So 


Sar 
Ilence we may write (1) and (2) concisely as: 
— = (3) 
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The sum of squares for treatments and interactions, eliminating 


regressions, is obtained in the usual way as p,(¢, 


- 


t,t, 
L+t 


Pr = 


— ¢2)”, where 


From the determinantal form given above it can readily be shown that 


— ¢2) is orthogonal to K and B, (u = 1, - 


variance is therefore as shown in Table 1. 


- , n) and the analysis of 


TABLE 1 
ANALYsIS OF VARIANCE TABLE 

Source of Variation df. 8.8. 

Mean 1 K’/ (4: + t) 
Treatments + interaction i t + 
eliminating regressions 7 
> ( ) 
+ ty u Pos 
Linear regression Bi 
Quadratic regression 1 B; 
nth order regression 1 B 
Unexplained variation ti + te by difference 
—n—2 
Total vii 


Further, since t, + t, = S, and K = So, , the regression terms and that 
for the mean, comprise the Schweinsian expansion of — A‘ /A™ [Cox, 


1957], where, 
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and hence the analysis for an individual set of observations may now be 
written as in Table 2. 


TABLE 2 
ANALYSIS OF VARIANCE FOR AN INDIVIDUAL SET OF OBSERVATIONS 


Source of Variation d.f. 8.8. 


Treatments + interactions eliminating 


regressions 1 
Regressions (including mean) ignoring 
treatments n+1 
Unexplained variation t; + t2 — (n + 2) | by difference 
Total htt vis 


Assuming normality, the significance of the treatment + interaction 
effects on the individual can then be tested against the mean square for 
the unexplained variation. And in new applications the individual 
mean squares may also be examined for homogeneity by standard 
methods. 


THE COMPLETE EXPERIMENT 


A complete experiment will consist of a number of individuals each 
receiving one of a set of treatment pairs. If the significance of the 
difference between members of a treatment pair is obtained in each case, 
a simple non-parametric test over individuals may be useful. In many 
applications, however, it will be safely possible to make the assumption 
that the individual interactions are normally distributed permitting the 
complete experiment to be analysed according to the model: 


(bux dor) = + & (4) 


where 6,, and 6., are the parameters for the first and second treatments 
given to the kth individual and 4, is the term for the interaction and 
unexplained variation. The analysis of this simple model will, of 
course, depend on the particular design by which treatment pairs are 
allocated to individuals; for example, if it is required to examine all 
treatments with equal precision, the arrangement using all possible 
treatment pairs, one per individual, with suitable replication, is one of 
the simplest. Other arrangements are readily envisaged and less balance, 
with unequal precision, may be useful in particular cases. 

The significance of treatment effects is examined by testing their 
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mean square against the interaction mean square in the usual way. 
Additionally, it can easily be shown from (3) that the variance of 
(diz — bax) is where is the variance of single observa- 
tion. Hence, if the within-individuals analyses are multiplied by 
A/S, they can be brought into a combined analysis on the basis 
of a single individual, the treatment plus interaction terms being 


' partitioned according to (4). The interaction meam square may then 


be tested against the unexplained variation and it is perhaps worth 
noting here that significant interactions are sometimes an indication 
that a higher-order regression model is required for the individual 
analyses. 

Finally, it will be appreciated that the effects estimated will be 
those for differences between treatments; quasi-treatment means can, 
however, be quoted on the assumption that >>, 0; = 0 as, for example 
with v treatments, 


4). 


j-1 


PRECISION 


The variance of the mean difference between a pair of treatments 
can generally be written as ko” where k is a factor depending on the 
design, the number of treatments, and their replication, and o’ is esti- 
mated by the mean square for the interaction term. For example, if 
each of all the possible pairs from v treatments is applied to one individual 
and replicated r times, k = (2v — 3)/(v — 1)’r. In some cases ex- 
perience may indicate that the assumption of zero interactions is valid 
so that a pooled estimate of o” becomes available. In general, however, 
o” will include the term given above for the variance of (¢. — dz), 
that is, (A /A%”)o7 , of being the variance of a single observation. The 
interesting question thus arises of finding that partition of the total 
number of observations per individual which — other things being equal— 
will minimise the contribution of the unexplained variation to the 
variance affecting treatment comparisons. This problem has been 
solved in the case of linear gradients (n = 1) as follows. 


Take z; = —(c — 2i + 1) wherec = t, + s + #¢, and, now, ¢ = 1, 
Then, 
AP = t,t, 1 So S, 
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and hence, 


A 4 hile + 8)" 
1*2 


+ t, 3(c + 


tl, u(t; 1) + 1) 
Now, if ¢ and s are constant, this is a function of the product ,é, 
only and differentiation, assuming continuity, gives a minimum for 
A‘ when, 


tito (t, + — 


the minimum value being 


7 12¢° 
(t, t2) [(t, + ty)” 1] 


Q) 
A™/a‘ 


In practice, of course, it will generally be necessary to approximate 
the solution of (5) in terms of integral values for ¢, and ¢, ; thus for 
c = 10, s = 1, maximum precision with respect to the variance of an 
individual observation is obtained when ¢, is either 7 or 2. The fact 
that maximum precision per individual is achieved with unequal treat- 
ment periods is a special feature of individual curvature analysis which 
may be advantageous when there are limitations on the time or material 
for some members of the treatment set. An explicit solution corres- 
sponding to (5), giving maximum precision in cases when n # 1, s ¥ 0, 
has not so far been obtained. 


SPECIAL CASE 


When the intermediate period between two treatments is zero, the 
above analysis is greatly simplified because, instead of the general 
determinantal expressions, tabulated orthogonal polynomial coefficients 
can be used. Some simplification is also possible if only linear gradients 
are to be allowed for since then, in the usual notation, 


Py - Z, = x = (t, + 
fu both these cases the solution given in equation (1) for (@, — @) may be 


the most convenient form combined with the first tabulation for the 
of variance. 
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RELEVANCE 

In one respect the present study can be considered as a generalisa- 
tion of covariance analysis to admit individual rather than mean 
covariance and the same basic assumption is involved—that treatment 
effects are independent of the covariate. Fundamentally this is a 
question of the appropriateness of the mathematical model to the 
experimental situation which has been discussed by Wilk and Kemp- 
thorne [11, 12] and, as “relevance,” by Cox [7]. In the latter it has 
been suggested that tests are desirable to examine (a) the adequacy 
with which a mathematical model represents the background ex- 
perimental material and (b) the validity of the assumption implicit 
in the model concerning the behaviour of the treatment responses 
including their possiblé interactions with the experimental material. 

For the reevance of the present analysis we require to confirm, 
particularly when temporal rather than spatial trends are being allowed 
for, that there are no treatment effects amongst the regression co- 
efficients. This can be done by calculating the regression coefficients 
within treatment periods for each individual and analysing their differ- 
ences over the complete experiment using a model analogous to that 
given above (4) for the treatment effects. The procedure is illustrated 
in the following numerical example. 


EXAMPLE 


No experiment specifically designed on the above principles is 
available, but the method can be applied to some results of an experi- 
ment by Mr. R. C. Jennings investigating possible dietary effects on the 
hatchability of fowl eggs. In this the experimental unit was a pen of 
30 birds and in all 8 pens were used to give 2 replicates of the scheme 
of Table 3. 


TABLE 3 
EX?vERIMENTAL DESIGN 
Pen 
1 2 3 4 
Period 1 HF HO RF ' RO 
2 RO HF HO 


Here, H and R represent two similar basal diets whilst F and O represent 
fishmeal and a control additive respectively. Hatchability was observed 
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ever fortnightly intervals of which there were 3 in the first, 1 in the 
intermediate, and 4 in the second period. 

The main object of the experiment was the comparison between 
fishmeal and the control additive and, but for a confounding factor due 
to a weekly cyclical rotation of cockerels, was arranged as a simple 
between-pen comparison. Ignoring the confounding factor, the within- 
pen comparison between the two basal diets will be examined here 
solely to illustrate the computational procedures of the above analysis. 
Linear gradients only will be assumed, but the general form of the 
analysis will be shown, 

Results y = sin™’ ~Wp/100, where p is the percentage hatchability, 
for the first pen, together with the x values of the intervals, are given 
in the following Table 4. 


TABLE 4 
OBSERVED VALUES OF & AND Y 
Period 1 (4 = 3) Period 2 (t, = 4) 
—7 —5 1 3 5 7 
y 72.4 68.1 66.8 69.1 64.7 63.6 65.0 
y — 65 7.4 3.1 1.8 4.1 —0.3 —1.4 0.0 


1. Relevance 


lor the relevance test we first calculate the within-treatment 
regressions for each pen and period. Thus for the first pen taking the 
response, sufficiently, as (y — 65) and using orthogonal polynomial 
coefficients, we have 


bh, = 3(—7.4 + 1.8) = —2.80, 
v'o[—3(4.1) + 0.3 — 1.4 + 3(0.0)] = —0.67. 


The difference (bh: — 0,2) is then obtained for each pen, the complete 
series being given in Table 5. 


il 


bie 


TABLE 5 
Pen DivFERENCES (ba — 


| Pen 


1 2 3 4 5 6 7 8 


(ba —2.13 1.43 1.14 2.12 -1.13 -0.57 -—0.30 —0.37 
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The analysis of variance of these results is then obtained as in Table 6. 


TABLE 6 
ANALYSIS OF VARIANCE FOR R&#SGRESSION DIFFERENCES 


Source of Variation d.f. 8.8. m.s. 

{HF — RF 1 2.2500 — 

Regression differences) RO 1 0.0012 
Regression differences X pens 6 11.9533 1.9922 

Total 8 14.2045 _— 


and indicates that the regression differences are not associated with 
treatments in this case. 


2. Individual analyses 
Preliminary calculations taking the response as (y-65) give, 
T,= 123, Si =4=3, Si? =-7-5-—3 = -15, 
Sn = 14.7, =7, 
—34+14+3+4+5+7 =1, 
+ 5° + 3?) + 1 = 167. 


ow A{”, which is constant for all pens, is obtained as 


3 3 —15 
3 7 1 
-15 1 167 
‘vhich reduces to 336. 
Vor we have 
T, 3 —15 
At? 7 1} = 11687, — 5168, + 1088), . 
Si; 167 
Si, is now calculated for each pen as, for example, 
Si, = —7(7.4) — 5(3.1) — 3(1.8) + 1(4.1) 


— 30.3) — 5(1-4) + 70.0) = —76.5. 
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Hence from equation (3) 


— 1480.8/336 


— 
—4.41. 


In routine applications the individual analyses of variance will not 
generally be required; the process for the first pen will, however, be 
given here for completeness. It is first necessary to calculate the 
determinant AS” for which the original responses must be used and not 
those obtained after subtracting the constant (which need not, of 
course, be the same for each pen). Hence to our previous values for 
So, and S,, we now add 65(t, + ¢,) and 65S, , i.e. 455 and 65 respectively, 
to give 


0 14.7 + 455 —76.5 + 65 
Ay’ =| 469.7 7 1 = —36854949.88. 
—11.5 1 167 


The value of the remaining determinant required, which is the sanie 
for all pens, is the coefficient of 7, in the expression already obtained 
for A} and thus the individual items in the analysis of variance may 
now be calculated to give Table 7. 


TABLE 7 
ANALYSIS OF VARIANCE FOR THE First PEN 


Source of Variation df. 8.8. 


Treatments + interactions 1 5.5874 
Regressions including mean 2 31553 .8954 
Unexplained variation 4 12.9872 

Total 7 | 31572.4700 


3. The complete experiment 


Values of (¢, — ¢.) for the complete series of pens, referred to their 
appropriate treatment differences, are given in Table 8. 
The mean values are: 


HF — RF = —3.51, HO — RO = —2.06 


wnd the simple final analysis is in Table 9. 
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TABLE 8 
TREATMENT DIFFERENCES FOR THE E1GHt PENS 
Pen number 
1 2 3 4 5 6 7 8 
Treat- 
ment |HF—RF HO—RO RF—HF RO—HO RO—HO RF—HF HF—RF HO—RO 
differ- 
ence —4.41 -—2.63 -2.25 9.03 -—2.39 2.77 -9.11 1.04 
(dr — $2) 
TABLE 9 
ANALYSIS OF VARIANCE 
Source of Variation d.f. 8.8. m.s. 
HF — RF 1 49 . 2804 
Treatments — RO 1 16.9332 — 
Treatments X pens 6 144.2135 24.0356 
Total 8 210.4271 . — 


Finally, the pooled mean square for the unexplained variation in the 
individual analyses was 4.7978 (32 d.f.) which, multiplied by A“ /A¢” 
to bring it on to a pen basis, gave 16.6781 which compares with 24.0356 
for the treatment X pen interaction mean square. 
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SUR UNE SOLUTION “A PRIORI” DE 
LA METHODE “A POSTERIORI” DE HALDANE 
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Paris, France 


La vérification des proportions mendéliennes en génétique humaine 
se heurte 4 une difficulté statistique tenant au mode de détection des 
familles. 

Lorsqu’il s’agit en effet d’étudier la stinaattilins des homozygotes 
tarés dans la descendance de couples hétérozygotes pour le géne récessif 
‘déterminant la tare, toutes les familles recensées comprennent au 
moins un taré, puisque c’est justement sur Vapparition d’un enfant 
anormal qu’est fondé le diagnostic de ’hétérozygotie des parents. 

Plusieurs méthodes statistiques ont été proposées pour pallier 
cette distorsion. Les unes, ‘a priori,” permettent en utilisant les 
tables de Bernstein ou de Hogben [1] de prévoir le nombre théorique 
attendu dans les cas de dominance (p = 1/2) ou de récessivité (p = 1/4). 

Ces méthodes permettent de savoir si les données sont en accord 
avec la ségrégation mendélienne, mais n’utilisent pas toute information 
disponible. 

Pour ce faire, Haldane [2] a développé et démontré par la méthode 
du maximum de vraisemblance une formule, “a posteriori,” dont 
l’établissement peut se schématiser ainsi: 

Soit p la probabilité d’apparition d’un enfant taré dans une fratrie 
née de parents hétérozygotes. 

Soit q la probabilité complémentaire (p + q = 1) de naissance 
d’un enfant normal. 

Soit le nombre d’enfants dont est composée une fratrie. 

Pour chaque fratrie de taille n la probabilité de ne comporter aucun 
taré est évidemment (q"). Comme le recensement ne décéle que les 
fratries contenant au moins un taré, la fréquence des fratries recensées - 
par rapport au total des fratries issues de parents hétérozygotes est 
(1 — q*). 


*Chargé de Recherche, Centre National de La Recherche Scientifique, Paris, France. 
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IH Sensuit que Ja probabilité de dénombrer des tarés dans chaque 
fratrie ainsi décelée west plus mais — q"). 
Si done nous recensons: 


fe fratriesde 2 enfants 

—4— 


nous devons attendre un total 7’ de tarés égal 4 


n 


(1) 


soit: 


7 1 + 3fs 1 + + 1 n 


q 

Ainsi que l’a montré Haldane [2], la meilleure estimée de p est 
celle qui, tous calculs faits, donnera une valeur de T égale au nombre 
de tarés, 7, , réellement observé. 

Malheureusement, la solution de cette équation ot l’inconnue est 
élevée 4 un dégré supérieur 4 4 est algébriquement impossible. 

Classiquement on utilise alors une solution, soit par itérations 
successives, soit par essais successifs de différentes valeurs de p. 

Le seul reproche que l’on puisse faire & ces méthodes est que les 
calculs qu’elles impliquent sont longs et fastidieux. 

C’est pour éviter cette perte de temps que la Table 1 a été calculée. 

Dans chaque ligne correspondant 4 une valeur donnée de la prob- 
abilité p, on trouve pour chaque taille de fratrie la fréquence théorique 
des tarés attendus. 

Des lors le calcul se déroule ainsi. On multiplie le nombre total 
d’enfants issus de fratries de taille n, soit (n X f,) par la fréquence 
correspondant & une valeur donnée de p, choisie arbitrairement 4 0, 
50, s'il s’agit d’un gene dominant, ou 40, 25, s’il s’agit d’un géne récessif. 

Aprés sommation de tous les termes partiels on obtient un total 
théonique 7, que on compare au total réel T,. Si T, > To, la valeur 
de p choisie est trop grande. On recommence alors avec la valeur de p 
immédiatement inférieure qui permet de calculer une nouvelle valeur 

SiT, < T, < T, , on en conclut que p* est compris entre les deux 
valeurs arbitraires et une interpolation linéaire (par une simple régle 
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de trois) donne une valeur de p approchée 4 mieux d’1/100, ce qui est 
en pratique amplement suffisant. 

Il va de soi que si la premiére valeur arbitraire de p avait donné 
un résultat 7, < T>, on aurait essayé la valeur immédiatement supéri- 
eure de la table, et interpolation se serait fait de fagon inuntigne. 

Reste l’estimation de l’erreur standard o,. de cathe estimée p*. 

Ainsi que Haldane I’a montré, la variance de p, a; , est donnée par 
la formule: 


(2) 


dans laquelle: 


T, = nombre total observé de tarés 

f, = nombre de fratries de taille n 

n = taille de la fratrie 
q = probabilité de la naissance d’un enfant normal 
p = probabilité d’apparition d’un taré. 


Ici un calcul direct est possible puisque p a été précédemment 
estimé. Cependant, le nombre de manipulations numériques est 
suffisamment élevé pour qu’une table, telle que la Table 2 soit d’une 
utilité manifeste. 

Cette table a été établie pour différentes valeurs de p et de n, en 
considérant que la formule de Haldane peut se réécrire: 


1 

qui est un équivalent algébrique de (2) puisque 7, = >-*_, if.(p/1 — q°), 
d’aprés (1). Pour obtenir cette somme il suffit done de choisir une 
valeur de p immédiatement inférieure & p* et de multiplier chaque 
valeur de la Table 2 par (nf,), c’est A dire le nombre d’enfants corre- 
spondant 4 chaque taille de fratrie et d’additionner ces résultats. 

On obtient ainsi une lére valeur S, . 

In effectuant les mémes calculs pour une seconde valeur de p im- 
médiatement supéricure 4 p*, on obtient une seconde valeur S, . 

Dés lors, 2. est compris entre 1/S, et 1/8, . 

Il se trouve qu’en raison de la variation curviligne de la valeur 
1/pql — — ig’ — en fonction de p une extrapolation 
linéaire entre S, et S, pour obtenir la valeur exacte S* correspondant 
a p* conduit A une surestimation légére de l’erreur standard o,.. 

Cependant, cette surestimation étant inférieure 4 1/100, cette 
extrapolation reste parfaitement légitime en pratique. 
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EXEMPLE NUMERIQUE 


Données de Sjégren [3] concernant 56 fratries contenant au moins 
un cas d’ataxie de Friedreich (les familles d’un seul enfant étant exclues). 


Taille de Nombre de Nombre total Nombre total 
la fratrie fratries d’entants de tarés observés 
n f nfn 
2 8 16 9 
3 11 33 14 
4 9 36 ll 
5 9 45 14 
6 7 42 15 
7 4 28 5 
8 2 16 2 
9 4 36 8 
10 2 20 6 
11 2 22 6 
12 1 12 6 


1) Calcul de p* 


Si nous prenons p, = 0.225 et. lisons les fréquences correspondant 
& chaque taille de fratrie dans la Table 1, nous effectuons 


(16) (0.5634) + (83) (0.4209) + (86) (0.3520) + --- + +--+ + (12) 
(0.2361). 

Soit 7, = 95.394, qui est trop faible. 

Une seconde valeur, p, = 0.250, conduit & (16) (0.5714) + (33) 
(0.4324) + (36) (0.3657) + --- + --- + (12) (0.2582). 

Soit T, = 100.492, qui est trop grand. 

Dés lors en admettant que dans I’intervalle de p, & p, , 7’ varie 
linéairement en fonction de p, 

p* = p, + (p. — pr) 


96 — 95.39% 
100.492 — 95.301 


Soit: p* = 0.225 + 0.025 


To = 96 
p 
T, = if, , 
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2) Calcul de o*, 
Reprenant p, = 0.225 et utilisant la Table 2, on calcule: 
S, = (16)(1.820) (33)(2.591) 
+ (36)(3.092) + --- + + (12)(4.985). 
Soit S,; = 1,146.54; et prenant p, = 0.250, 
S, = (16)(1.741) + (33)(2.493) 


+ (36)(2.987) + --- + --- + (12)(4.787). 
Soit S, = 1,108.494. 


En admettant ici aussi que S varie linéairement en fonction de p 
dans l’intervalle de p, A pz , 


— 
S* S, + (S, gj 
P2 — Di 
0.06297 
* = = 
d’ot S 1,146.54 — 38.046 0.025 1 142.017 
1 
= 
et = 1,142.017 et 0.02958. 


On écrira done au total, p* = 0.228 + 0.0295. 


INTERET DE LA METHODE 


Ainsi que le lecteur pourra s’en assurer par lui-méme, la méthode par 
itérations successives conduirait 4 des valeurs identiques, les différences 
ne portant que sur le 4° chiffre significatif. 

L’avantage des deux tables ici présentées réside uniquement dans 
la rapidité du calcul qui demande environ une dizaine de minutes, alors 
que la méthode par itérations successives nécessite plusieurs heures. 


RESUME 


L’étude de la ségrégation mendélienne dans les familles humaines 
est rendue difficile du fait que les familles recensées comprennent 
toutes au moins un taré. La formule a posteriori de J. B. 8S. Haldane 
permet de pallier cette difficulté, mais n’est pas algébriquement soluble. 

Une méthode de solution approximative 4 l’aide de tables est 
présentée, permettant un calcul rapide avec une précision suffisante 
pour les besoins de la génétique. 
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Introduction 


The present paper has in view the intra-block analysis of a group 
of experiments in complete randomised blocks where the treatments 
are not all the same, but one or more treatments are common to the 
whole set. Assume that we have g different experiments in complete 
randomised blocks, each with r replications and k = z + c treatments, 
of which c treatments are common to all experiments. If the residual 
variance estimates are not too different, a joint analysis can be carried 
out for the whole set of experiments, which are then considered as 
forming just one trial with incomplete blocks. A similar situation 
was briefly discussed by Yates [6]. The design to be studied is a special 
case of the intra- and intergroup balanced incomplete blocks, proposed 
by Rao [5]. 

Specific directions for the analysis are given and an example is 
presented where two experiments with Eucalyptus sp. are put together, 
each with 5 replications and 10 treatments (species), one of which 
(£. saligna) was common to both experiments. 


Estimation of Adjusted Means and Analysis of Contrasts 


As stated above, we assume g different experiments in complete 
randomised blocks, each with r replicates and k = z + ¢ treatments. 
Of these, c are common to all experiments, which are, therefore, con- 
nected by them. The total number of treatments is v = gz +c. The 
treatments will be classified into two classes: (1) common treatments, 
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already defined, and (2) regular treatments, which are the others. The 
regular treatments have r replications, while the common ones are 
replicated gr times. The regular treatments are separated into g groups, 
each corresponding to an experiment. 

The whole set of experiments can be taken as just one trial with 
incomplete blocks and our model is the usual Model I of analysis of 
variance for incomplete block designs. The study of the least squares 
equations shows that the estimates for the treatment means of the 
common treatments are the usual arithmetic means of the yields: no 
adjustment is necessary. lor the means of regular treatments of any 
experiment, the adjustment is just to subtract (Mean of common 
treatments in that experiment) — (Overall mean of common treatments). 

In the analysis of variance the sum of squares corresponding to 
treatments (adjusted) can be obtained in the usual way (see, for example, 
Rao’s paper), but it is easier to chlculate it as follows: firstly, compute 
the sum of squares for blocks (unadjusted) and the sum of squares for 
the between common treatments X experiments interaction in the 
usual way, then obtain the sum of squares for error by adding up the 
respective totals in each individual experiment, and finally determining 
the sum of squares for treatments (adjusted) by subtraction from the 
total sum of squares. 

For a contrast Y; = m;; — m,, between two means of regular 
treatments (jth and kth) in the same 7th group, the estimate of the 
variance of its estimate is given by 


V(P,) = (2/r)s’. 
On the other hand, for a contrast Y, = m;; — my, (i ¥ 1), between 
two means of regular treatments in different groups, we have 
V(t.) = (2/01 + 1/0)’. 
Now for a contrast Y; between two means of common treaiments, 
= (2/gr)s’, 
and finally for a contrast Y, between a common treatment and a regular 
one we have 


Vl.) = + (1/e) + (1/9) — (1/eQ)]s’. 


If c = 1, thateis, if there is just one common treatment, then 
P(t.) = (2/r)s? and we may consider only two types of contrasts: 
(1) Contrasts between two regular treatments in the same group or 
between the common treatment and any regular treatment; (2) Con- 
trasts between two regular treatments in different groups. - In this 
ease it seems advisable to use Tukey’s studentized range test. for Case 
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| with g obtained for k = z + 1 treatments; for Case 2. y will correspond 
to gz treatments. 

When only contrasts between the common treatment (taken as a 
control) and a regular one are to be considered, Dunnett’s procedure 
[2] might be preferred, with least significant difference = ¢/P(f,), 
where ¢ refers to the multivariate Student ¢-distribution. 

The analysis now explained assumes obviously that no treatments 
X experiments interaction is present. If this is not so, the most ap- 
propiiate residual mean square for comparing regular treatments in 
different experiments would be the between common treatments X 
experiments interaction. 


Example of Analysis 


We shall analyse as an example a group of two experiments with 
different species of Lucalyptus, carried out by the “Companhia Paulista 
de Istradas de Verro,” in Aimorés, State of Sao Paulo, Brasil. Both 
experiments, located close together in the field, had ten treatments 
(species), in complete randomised blocks, with 5 replications. One 
and only one of the species (Z. saligna), which is the standard one in 
use in that state of Brasil, was common to both experiments. Data 
on firewood production, in cubic meters per plot of 200 square meters, 
were obtained 8 years after planting. They appear in Table 1. Analyses 
of variance are given in Table 2. 


TABLE 1 


YIELD OF Firewoop 1n Cusic Meters, Per or 200 Square Meters 
Seectes oF Eucalyptus 


= 

; ig 

iz 

1 2 3 4 5 6 7 8 9 10 Pie a 

4.8 28 41 3.7 36 #47 #20 25 -3.0 6.1 
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TABLE 2 


ANALYSES OF VARIANCE 
EXPERIMENT No. 1 


Source of variation D.F. S.S. M.S. F 
Blocks | 4 2.307 
Treatments 9 85.127 9.459 28 . 66 
Residual | 36 11.869 0.330 = 

| 
I-XPERIMENT No. 2 

Source of variation DF. 8.8. M.S. F 
Blocks | 4 4.565 — — 
Treatments | 9 108.221 12.025 15.58 
Residual | 36 27.775 0.772 — 


Researches carried out by Box [1] pointed out that when the in- 
equality of variances is moderate, the true variance ratio not over 
3 to 1, the effect on the F test is small if the number of observations 
is the same in all classes. So in the present case it seemed reasonable 
to combine both experiments to allow proper comparison of treatments 
of x group with treatments of the other group. ‘The joint analysis 
of variance is given below. 


TABLE 3 


ComBINED ANALYSIS OF VARIANCE 


Source of variation 


D.F. SS. MS. PF 


Blocks (unadjusted) | 9 28.402 vm 


Treatments (adjusted) 


Residual 


19 193.340 10.742 19.50 
72 39.644 0.551 — 


The means of the 19 species are given in Table 4. 


A 
: 
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TABLE 4 
MEANS OF THE 19 SPECIES 
Treatment numbers | Unadjusted means | Adjusted means 

1 (Common) 4.96 4.96 
3.10 2.98 
3 5.32 5.20 
4 3.98 3.86 
5 3.62 3.50 
6 5.46 5.34 
7 2.70 2.58 
8 2.24 
9 2.84 2.72 
10 6.22 6.10 
11 4.70 4.82 
12 4.56 4.68 
13 4.02 4.14 
14 4.00 4,12 

15 2.92 3.04 , 

16 2.74 2.86 , 
17 1.72 1.84 
18 1.36 1.48 
19 0.42 0.54 


| 
| 
| 
| 
| 
| 
| 
| 
| 


For a contrast Y, between means of two regular treatments in the 
same group or the common treatment and a regular treatmggnt, we have 


= (2/)s? = (2/5) 0.551 = 0.220." 


On the other hand, for a contrast Y, between two regular treatments 
means, each from a group, we obtain 


V(¥.) = (2/1 + = (4/5) 0.551 = 0.441. 


In the case of a contrast between two regular treatments in the 
same group or the common treatment and a regular treatment we may 
use, with the usual restrictions, the ¢ test or the studentized range test, 
with V(Y,) = 0.220. The latter, in the present case, would give by 
Tukey’s method (see Federer [3] and Pimentel Gomes [4]) as least 
significant difference, at the 5% level of probability, 


A = qV(1/2)0(F) = 4.62-V/0.220/2 = 1.53. 


The value of q is taken from the tables, corresponding to 10 treat- 
ments and 72 degrees of freedom for the residusl mean square. 
Of course one might prefer to compute the least significant difference, 
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in this case, with the residual mean square for each group. For the 
first group we should obtain, at the 5% level, 


A = 4.77V 0.330/5 = 1.23, 
while for the second group it would be 
A = 4.77-V/0.772/5 = 1.87. 


Now for a contrast Y, between two regular treatment means from 
different groups, one may take the q value corresponding to 18 treat- 
ments and 72 degrees of freedom. At the 5% level, the least significant 
difference by the studentized range test would be 


A = 5.11V/0.441/2 = 2.40. 
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THE AFTER-HISTORY OF PULMONARY TUBERCULOSIS:* 
A STOCHASTIC MODEL 


Davip W. ALLING 


Cornell University 
Ithaca, New York, U.S.A. 


1. Introduction 


In recent years stochastic models of several chronic illnesses have 
been constructed [1-5]. Such models are an outgrowth of the realiza- 
tion that whereas in an individual patient the clinical course of a chronic 
disease is a sequence of unpredictable events, in groups of patients the 
course of the disease exhibits a discernable tendency toward regularity. 
On the other hand, it is well known that whereas an individual path of 
a stochastic process is not predictable in a deterministic sense, collections 
of such paths tend toward an average frequency distribution of states at 
each step from the starting point. Although there is a suggestive 
parallelism between the clinical course of a long-term illness and the 
path of a stochastic process, the validity of the analogy depends on 
whether the clinical events observed during the course of a chronic 
disease are genuinely random in nature. 

The construction of stochastic models of chronic disease is based on 
several simplifications of ordinary clinical experience. First, a clearly 
defined starting point in time must be chosen (e.g. diagnosis, completion 
of treatment, relapse). Second, clinical states must be defined in such 
a manner that at a predetermined set of times following the starting 
point the patient can be classified as being in one, and only one state. 

‘ When a stochastic model consists entirely of states between which, 
if communication exists at all, instantaneous transition is possible, 
then the model may be made time continuous, i.e. at any moment after 
the starting point there is defined a probability of being in any given 
state. Very commonly in chronic disease, however, attention is focussed 
on clinical states which under certain circumstances preclude instan- 
taneous transition from one to another. The model discussed below 
contains two such states, arrested pulmonary tuberculosis and active 
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pulmonary tuberculosis. The definition of arrested pulmonary tubercu- 
losis requires that certain conditions be present during a six-month 
period before entrance into the state is possible, otherwise the disease 
continues to be classified as active. It is clear that, if at a given time one 
of these conditions is absent, then the state “arrested pulmonary 
tuberculosis” is impossible during the next six months. Hence at 
certain points the probability function 


p| Patient has arrested | Patient had active 
disease at time t, | disease at time ¢, 


| 


may lack a derivative and may well be discontinuous. In any case it is 
not a stationary function, which is a serious drawback. [A function of 
two arguments f(x, y) is said to be stationary if f depends only on the 
difference x — y.] 

However, if inquiry as to the state of the stochastic system is made 
only at specified times following the starting point, there will result a 
discrete stochastic process with discrete transitional probability func- 
tions about which no questions of differentiability or continuity will 
urise. Moreover, the discrete process may happen to be stationary 
even though the process resulting from continuous consideration of the 
system is not. With the present model this is indeed the case. 

In general, discrete stochastic processes permit greater freedom in 
defining states than do continuous processes. In the latter the use of 
such indispensable clinical definitions as “signs of disease present’”’ and 
“signs of disease absent” make little sense when it is realized that such 
use would result in a positive probability being assigned to the event 
“jens of disease are absent at time ¢ given that signs of disease were 
present one minute beforehand.” Finally, it is noted that customarily 
the time period during which a chronic illness is under study is sub- 
divided into intervals of convenient length and the clinical data are 
grouped accordingly. Hence, in the analysis of such data, it would 
seem natural to use a discrete stochastic process and a discrete esti- 
mation procedure since no sacrifice of information is involved. 

The stochastic processes so far studied in the biological sciences 
have usually been Markovian although recently a non-Markovian 
process has been proposed by Cole [6] to describe fluctuations in the 
size of animal populations. Among the general class of Markov pro- 
cesses the most convenient for practical purposes is the stationary 
variety, ie. one whose transition probability functions 


System is in state | System was in state |, s,t > 0, 
E,attimes+t FE; at times 
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depend only on t. However, non-stationary processes, if simple enough 
in structure, may lend themselves to analysis. Berkson and Gage [2] 
describe a non-stationary process which consists of two transient 
(unessential) states (living with uncured cancer, living with cured 
cancer) and two absorbing states (dead from cancer, dead from ordinary 
causes). Since there is no communication between the transient states 
the process is essentially the sum of two subprocesses, one consisting 
of a transient state and an absorbing state, the other of a transient 
state and two absorbing states. 

In order to study transition systems of a more complicated structure, 
particularly those allowing two-way communication between transient 
states, it seems almost a necessity at the present time to resort to sto- 
chastic processes of a stationary nature. Accordingly, the process to be 
discussed here will be stationary as well as being discrete and Markovian 
and consequently will have constant first-order transition probabilities. 

A necessary condition that a continuous function be stationary is 
that it be convex downward. The ordinary survival curve (/, of an 
actuarial table) is convex upward [7] in the age range which concerns 
us (15-65 years) and hence is not stationary. Since the transition 
“alive — dead of ordinary causes” occurs in the data to be discussed 
and since inclusion of this transition in the model would introduce a 
non-stationary function, the question arises as to how such events are 
to be treated. We will eliminate this transition by arbitrarily ending 
the period of observation at the point of time just prior to that at 
which death from ordinary causes is recorded. Since the event “avoiding 
death from tuberculosis” and the event “avoiding death from other 
causes” are usually considered to be independent, the proposed model 
will then furnish predictions subject to the condition that deaths from 
causes other than tuberculosis do not occur. 

The validity of this procedure depends in part on the accuracy of 
the classification of the cause of death. In this connection Berkson and 
Gage [2] have rightly pointed out that in some cases the true cause of 
death is by no means so easy to determine as the fact of death. However, 
if the number of deaths ascribed to causes other than tuberculosis is 
compared with the expected number computed from an acturial table 
and no appreciable difference is found, then there is reason to believe 
that no gross error in classification has been made. The clinical data 
{8, 9] used for illustration of the present model have been scrutinized 
in this manner and the results are shown in Table 1. In view of the 
close agreement between the observed and expected numbers of deaths 
from causes other than tuberculosis, it would appear that no serious 
error was made in the classification of the cause of death. 
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TABLE 1 


CoMmPARISON OF OBSERVED AND Expectep! DEATHS AMONG PATIENTS WITH 
PuL_monary TUBERCULOSIS 


i No. Deaths—Other Causes 
Stage of Disease ' No. Tuberculosis 
at Diagnosis | Deaths Observed Observed Expected 
Minimal? 15 3 1.8 
Far advanced 
Age less than 45 yrs. 52 3 1.2 
Age more than 45 yrs. | 92 | 2 5.6 


'The expected number of deaths from causes other than tuberculosis was computed by subtracting 
the expected number of deaths from tuberculosis (10) from the expected number of deaths from ail 
causes (7). 

*In order that the time period for admission to the study of minimal tuberculosis might correspond 
to the time period for admission to the study of far advanced tuberculosis (which was 1938-1948), 
9 patients with active minimal tuberculosis diagnosed in 1948 were added to the original clinical 
material while 13 patients whose diagnoses fell in 1937 were excluded. 


We will now review some follow-up data from a study of pulmonary 
tuberculosis [9] which have been interpreted in accordance with the 
Diagnostic Standards [1940 edition} published by the National Tubercu- 
losis Association. Since the use of the Standards in this connection has 
been discussed elsewhere [11], we will merely state here that usually a 
person with “active” pulmonary tuberculosis is ill and in need of treat- 
ment, whereas a person with “arrested” tuberculosis ordinarily is fully 
capable of leading a normal life. The starting point of the study is the 
date of diagnosis. On this date and at succeeding anniversaries each 
patient is classified as having either active tuberculosis (A) or arrested 
tuberculosis (R) or as being dead of tuberculosis (D). If the patient 
dics of causes other than tuberculosis or if he is lost to follow-up obser- 
vation, then the series of classifications terminates at the anniversary 
of diagnosis just preceding either of these events, otherwise the classi- 
fications continue until the date on which the study closes. 

The follow-up data pertaining to an individual patient are thus 
summarized by a row of letters (e.g. A A A R A R A D D) which ad- 
mittedly is a crude representation of a complex and subtle disease 
process. Pairs of adjacent letters in the row represent transitions 
between states, the possible transitions being: A —~ A, R — R, D— D, 
A, A— Rk, A— D. The last three transitions are commonly 
termed relapse, remission, and death, respectively. The transitions 
D — A and D — R are obviously impossible and the transition R — D 
“an be regarded as impossible since it occurs so rarely in ordinary 
clinical experience. Inasmuch as a full year elapses between classi- 
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fications of state the question arises whether, if a shorter period of time 
were allowed between classifications, two or more changes of state 
could occur during an annual interval. A careful re-examination of the 
data on which this study is based turned up only occasional instances 
of such multiple changes of state during a twelve-month period. 

The most obvious way to construct a stochastic model of tuberculosis 
is to use the three clinical states: active tuberculosis (A), arrested 
tuberculosis (R), and dead of tuberculosis (D). Since we are con- 
sidering only Markov chains with constant first-order transition 
probabilities, one of the direct consequences of ‘this classification of 
states is that the follow-up experience represented by the pattern of ¢ 
consecutive R’s followed by an A (R --- R A) will have a probability 
function of the form p‘'(1 — p) where p is the probability of the 
transition R — R. In general, the estimate of the probability of this 
pattern of letters is known as the “tth year relapse rate.” 


TABLE 2 — 
Rates (IN Per cent)—MInIMAL PULMONARY TUBERCULOSIS 
Observed 3.8 2:9 25 1.4 0) 
Predicted—p*"(1 — p) 2.1 2.0 2.0 1.9 1.9 1.9 1.8 18 1.8 
Predicted—kp*-(1 — p) 4.3 2.7 1.6 1.0 0.6 0.4 0.2 0.1 0.1 


In Table 2 is shown a set of relapse rates observed among patients 
who had minimal pulmonary tuberculosis at the time of diagnosis [9]. 
In the second line of the table are shown the relapse rates predicted by a 
function of the “.zm p'"'(1 — p). The fit is clearly very poor. However, 
if we imagine that only a fraction, say k, of the people with arrested 
minimal tuberculosis are ever at risk of relapse, we are led to a function 
of the form kp‘~'(1 — p) whereO0 < k <1. The relapse rates predicted 
by a function of the latter form are shown in the third line of the table 
and these are seen to correspond reasonably well with the observed 
relapse rates. Unfortunately. the function kp‘~'(1 — p) is not stationary 
since 


ee has a relapse 
in (s + ¢t)th year 


Patient had = p) 
disease in the sth year] —k-+kp* 


Patient had arrested 


Patient has a relapse 
P| disease at the outset 


in the tth year 
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Thus, we have on the one hand a poorly fitting stationary function and 
on the other hand a well fitting non-stationary function. 

We can circumvent this difficulty by resorting to an expedient due 
to Boag [1], namely of subdividing the states of our model in order to 
obtain transition probability functions of a more convenient form. For 
example, instead of the state R (arrested tuberculosis), let us consider 
two substates: R, , arrested tuberculosis that is certain eventually to 
relapse and R, , arrested tuberculosis that will never relapse. The 
probabilities of the transitions R, — R, and R, — A are defined to be 
p and 1 — p respectively. On the other hand, the probabilities of the 
transitions R, — R, and R, — A are defined to be 1 and 0 respectively. 
Among the patients with arrested tuberculosis the proportion whose 
disease is in state R, is defined to be k while the proportion whose 
disease is in state R, is 1 — k. 

For patients whose disease is of type R, and hence is certain to 
relapse, clearly the following transition probability function is stationary: 


Patient had arrested | _ ped — p) 
disease in the sth year} 


Patient had — 
disease at the outset 


pe has a relapse 
in the (s + ¢)th year 


Patient has a relapse 


On the other hand, for patients whose disease will never relapse, the 
corresponding transition probability function is identically zero and 
hence also is stationary. In a. similar manner it is easy to see that the 
two remaining probability functions associated with the transitions 
R, — R, and R, — R, also are stationary. 

Now consider a patient about whom we know nothing except that 
his tuberculosis is arrested. The probability that his disease will relapse 
in the tth year is k-p'"'(1 — p), the probability that his disease is of 
type R, times the conditional probability that the disease will relapse 
in the tth year given that it is of type R, .. This is precisely the function 
which previously was seen to fit well with the observed data. 

An analogous approach may be made to the state A (active tubercu- 
losis), dividing it into three substates: A, , active tuberculosis that is 
certain to remain active indefinitely; A, , active tuberculosis that is 
certain to become arrested; A; , active tuberculosis that is certain to 
terminate fatally without ever becoming arrested. We will defer 
until later the question as to whether the distributions arising from these 
subdivisions of the original state A fit well with the observed data. 

Having completed these rather lengthy preliminaries we now 
proceed to a formal description of the model. 


ng 
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2. The Model 


The notation and terminology used in the following discussion are 
drawn largely from Feller [12]. We define the following states: 


E, Active tuberculosis that will remain active indefinitely. 
E, Arrested tuberculosis that will remain arrested indefinitely. 
E, Tuberculosis that has proved fatal. 
E, Arrested tuberculosis that is certain sometime to become active. 
_E, Active tuberculosis that is certain sometime to become arrested. 
E, Active tuberculosis that is certain sometime to be fatal without 
ever becoming arrested. 


The probability of the disease being in state E, at the time of 
diagnosis will be denoted by a, k = 1, --- , 6, and the conditional 
probability of the disease being in state Z,; on a particular anniversary 
given that it was in State HZ; on the preceding anniversary will be 
denoted by Dii 

The matrix of the transition probabilities is: 


0 1:0 
Par O Pas Pass Das C-B 


Since we shall be interested in pulmonary tuberculosis that is 
active at the time of diagnosis, we will set a, = a, = a, = 0. Writing 


a 00:00 0 


10 00:00 ag, 


we see that the probability of the disease being in state EZ; on the ‘th 
anniversary of diagnosis is the jth column sum of the matrix AP’. 
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‘ It can be shown that under rather mild restrictions on the transition 


probabilities the characteristic values of B are less than unity in absolute 
value. Hence J — B is non-singular and 


A, 0 


Now each of the submatrices A, , Az , J, and C is a diagonal matrix 
and B may readily be diagonalized by standard methods. Inasmuch 
as the characteristic values of B are less than unity in absolute value, 
B‘ —0ast— o and the possibility of a periodic chain is eliminated. 

Designating the characteristic values of B by A, , Ax , and A; = pos 
we find for the elements of AP’: 


=a, 


— AsPs2 EE — Pss) — pa) | 

= — — — Poo)r — da) 


(1 — — Pos)(Ar. — Az) (1 — — Dos)(A2 — Pos) 1 
= — Pee); 
a 
[A201 — pss) — — Pss)], fi 
t t t 
(Ar — Pos)(A1 — Av) (Ae — Pos)(Ar — Az) ti 
ry 
t 
= AcPoo - fi 


Returning to the matrix P we see that the only transition prob-- ti 
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abilities that are associated with changes from arrested tuberculosis 
to active tuberculosis are p4; , pas , ANd pyg . These will be assumed to 


be proportional to the corresponding initial probabilities, or in other 
words 


The available follow-up data, although meager in number, lend support 
to this assumption which is the simplest that can be made about py, , 
Pss » 2nd pss . We Shall see later that the estimation procedure is 
appreciably facilitated by its use. Now the row sums of P must be 
unity and the same is true of the sum of the initial probabilities. Hence, 
Pu + Pas + Pas = 1 — pay and a, + as + de = 1 so that we can write pa = 
a,(1 — pas), Pas = — pag), and Pas = — pas). In a similar 
manner, the probabilities p;. and p;, associated with changes from active 
tuberculosis to arrested tuberculosis are written (1 — r) (1 — pss) and 
r(1 — pss), respectively, where r is the proportion of patients who will 
ever relapse. 

Having made these assumptions we can easily show that, if r, pu , 
Qs , Pss » Ge » ANd Poe are known, then all of the probabilities in A and P 
can be calculated. 


3. Estimation 


We shall now derive maximum likelihood estimators of the six 
parameters noted above. The choice of this method of estimation is 
not wholly arbitrary since an attempt was made to derive estimators 
of the type described by Neyman [13]. It was found, however, that the 
multiplicity of restrictions imposed by the conditions under which 
the data were collected led to systems of linear equations of unmanage- 
able size. 

Our data consist of rows of letters (A, R, D) which denote the 
clinical status of tuberculosis in a patient at successive anniversaries 
of diagnosis. Since each pair of adjacent letters defines a transition, it 
follows that to each row of letters there corresponds a sequence of 
transitions. In computing the probability of a particular sequence 
of transitions it is convenient to break up the sequence into runs of 
transitions. 

For example, the follow-up data represented by the row of letters 
AAARRAAAAAcan be broken up into three runs of transitions 
as follows: A The 
first run is followed by the transition A — R which implies that at the 
time of diagnosis (beginning of the row) the patient’s disease was in 
state E, , an event which has (initial) probability a; . The disease 
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remains in E, for two anniversaries with probability p?, , hence the 
probability of the first run is agpi,. The transition A — R occurs at 
the beginning of the second run end we know that the 2 represents 
the state E, since the disease becomes active again after the completion 
of this run. The remaining transition of the run, R — R, has prob- 
ability p., and hence the probability of the run is pupa. The final 
run of transitions R —- A — A—» A — A — A is terminated by the 


TABLE 3 
Runs or TRANSITIONS PRoBaBILsTIES AND OBSERVED FREQUENCIES 


chose of the follow-up stady and we are uninformed as to which of the 


Type of Run 


Letter 
Following Run 


Number of 
Runs Observed 


Middle runs 
A-R-—- --- 
| 
jt 
| 
j times 


End runs : 


j + 1 times 


j+ltimes . 


| 
i + 1 times 


| 
a 
Single run 
| 
j + 1 times 
j times 
j times 
nj 
pupis' 
Puplr + pupls + pup) mf 
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states of active tuberculosis (EZ, , Z; , or E,) the disease has passed. 
Hence the probability of the last run is papi, + pasD5s + DasDoc - 

In Table 3 are shown the probabilities associated with each of the 
possible runs of transitions. It is obvious that any sequence of transi- 


tions can be broken up in a unique fashion into runs of the type listed. 
The likelihood function is 


+ DssPial”” [papi + paspis 


where s is the length of the longest run observed. It is noted that the 
foregoing likelihood expression is a general one and does not depend on 
the assumption (pa: : Pas : Pas Gs Gg) Made in the previous section. 

We now apply this assumption and express the likelihood in a form 
more convenient for the purpose of deriving estimators. Writing 
have clearly m{? + m‘? + m;® = n, since each of the runs enumerated 
by the letters on the left is initiated by an R and the total number of 
runs ending in R, excepting end runs, isn; . By similar arguments we 
have n; + n* = m, and m; = d*. Thus 


[apis + aspis + + paspis + 


= — as — a5 + aspis + aspie]™ 


[1 — as — + aspis + 
— pal? 

= [1 — as(1 — pis) — — — 

= faspis']™ — 
+ = [1 — — pid] — 
and = [1 — 
The likelihood may now be expressed: — 


L = I [rpis"(1 — pes) — (1 — poo) 
— rl — [1 — — pi.) — — 


‘te 
— 
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The maximum likelihood estimates for r and p,, satisfy the follow- 
ing equations, provided that >°}_, n; > 0 and )>;_, n* > 0, 


Yn, Yn, 


Pas) TIP s4 


If the inequalities are not satisfied, a trivial form of the equation may 
easily be written down directly from L. Rearranging and simplifying 
the first equation, we have 


— — (nj + =N, 


say, so. that 
f= 
N n* Dis 


Approximate solutions to the foregoing likelihood equations may be 
found by the following iterative method. Let 


i=l i=l i=l 


i=l 
Oe = 


n*n', 


j=1 a,(1 mi) 


Yn, 


i=l 


and let 


= 


i=l 


k = 0,1, 2,--- . It can be shown that the sequences {a,} and {z,} 
are increasing and have as limits f and p,, , respectively. 

The maximum likelihood estimators for a; , pss , and fe. Satisfy the 
following equations, provided that }-‘_, m; > 0, >>3_, m; > 0, and 
Laws m* > 0, 


| m*(1 — pis) 


| 
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> fi, 
441 — 4,(1 — pis) — — 


Pos = 1 — {1 


Adding together the first two equations, we have 


m* 


— — pis) — — fies) 
where M = )°‘_, (m; + fi; + m*). Hence 
xm, 
ds = ist 


m*pis 


fi, 
a, e int 
M— 2 — — 
As before, the likelihood equations can be solved approximately by 
an iterative procedure. The sequences {6,}, {m}, {5.}, and {r.} are 
defined as follows: 


Bo = > m/M, Po = > 


dm; 
: m* 


= 
7 
> 
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pss = 1 — 
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i=l 
P+ = s 7 
M — m* ‘Th 
B.(1 = = Ti) 


ms 
64. = 1- = 


= m*B,j81 


> 


To = 1 = ’ 
~ - - all — 
k = 0,1, 2, --- . The author has not succeeded in showing that the 


Sequences {Bx}, {pe}, {6}, and {te} converge to Gs Dss dg ’ and Des 
respectively. In a dozen or so applications of the iterative procedure 

i just described, however, the sequences appear to converge with moderate 
rapidity toward values satisfying the likelihood equations. 


4. Examples 


aa The model described in Section 2 furnishes predictions of the expected 

: proportion of patients with active tuberculosis, the proportion with 
arrested tuberculosis, and the proportion dead of tuberculosis at yearly 
intervals after a starting point in time. For example, if we add together 
the entries of the first, fifth, and sixth columns of the matrix AP‘, we 
obtain the probability that a patient will have active tuberculosis ¢ 
years after diagnosis, subject to the initial conditions that with prob- 
ability a; the patient’s tuberculosis is in state E; , i = 1, 5, 6. pom 
we can write 

P Disease is active at the 


tth anniversary 
a, as | a;(ps, + pss + + 
and similarly 
Disease is arrested at © 4 pi?) 
tth anniversary |“ a | as(Psz 


Disease has proved fatal at or 
before the éth anniversary 


(t) (t) 
a, , as = Asps3 + - 


] | 
| 


A STOCHASTIC MODEL FOR TUBERCULOSIS . 541 


Follow-up data from a study of minimal tuberculosis [9] were used : 
to compute estimates of r, Pu , Gs , Pss , 2s , Poo - In obtaining approxi- 
mate solutions to the likelihood equations by use of the iterative method 
described above, the author found that from four to eight iterations were | 
necessary to achieve constancy in the third decimal place. The crude 
data are shown in Table 4. The approximate maximum likelihood 
estimates were found to be 


r= 2234, a, = 8648, a, = .1305, 


Pua = .6752, DPss = .6006, Pes = .8338. 
TABLE 4 
Cruve Data From Foittow-Up 


Srupy or Minima Putmonary TUBERCULOSIS 


10 


nf | 12 7 9 9 10 9 9 15 i0 0 


From these the remainder of the parameters were calculated and 
substituted into the expressions for the three conditional probabilities 
noted above. The complete set of values is shown in Table 5. Using 
the follow-up data from the same study, direct estimates were made of 
the proportions of patients with active tuberculosis, with arrested 
tuberculosis, and dead of tuberculosis at each of ten anniversaries of 
diagnosis. The computations were performed according to a method 
described by Bosworth [11] and are shown in Table 5. The direct 
estimates are compared with the model prediction in Figure 1. There 
appears to be a reasonably satisfactory agreement between the two 
sets of values. 

Alternatively, the agreement between the model predictions and the 
empirical data can be assessed by comparing the number of remissions 
and deaths predicted to occur after various durations of active tubercu- 
losis with the number of remissions and deaths actually observed. 
Similarly the predicted number of relapses after varying durations of — 
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Figure 1 ‘ 
OBSERVED vs. PREDICTED VALUES 


MINIMAL PULMONARY TUBERCULOSIS 


—— OBSERVED 

PREDICTED 

60 €0 

40 

20 ACTIVE 20 

DEAD OF TUBERCULOSIS 
DIAG. 1 2 3 4 5 6 7 8 9 0 


ANNIVERSARY OF DIAGNOSIS 


arrested tuberculosis may be compared with the observed number. . 
The predicted values of m,; , %; , and n,; are shown in Table 6, and these 


may be compared with the corresponding empirical values shown in 
Table 4. 


TABLE 6 


PREDICTED VALUES OF 7; , , 
MINIMAL PULMONARY TUBERCULOSIS 


nj 8.4 3.2 0:4. 03 02 0:0 


m;| 48.7 29.3 17.1 9.2 4.7 2.5 


3.2 2.7. 2.2 1.9 13 O8 05 O38 0.2 


The agreement between the predicted and observed values of m, , 
m,; , and n; does not appear to be so satisfactory as that between the 
sets of predicted and observed values shown in Figure 1. The most 
noticeable discrepancy is that occurring between the predicted and 
observed values of m; and for this there is a plausible explanation. The 
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numbers m; are enumerations of transitions from the state of active 
tuberculosis to the state of arrested tuberculosis. Now according to 
the diagnostic standards of the National Tuberculosis Association the 
disease must be classified as active so long as there is any change in 
the serial chest roentgenograms, whether the change be favorable or 
unfavorable. Hence the clinical status ‘‘active tuberculosis” is a 
markedly heterogeneous classification since it includes both the patient 
who is nearly dead of tuberculosis and the patient who is nearly well. 
If the persons classified as having active tuberculosis are divided into 
two groups according to whether the roentgenographic changes are 
favorable or unfavorable, then a more satisfactory homogeneity is 
realized and the yearly number of observed remissions in each group 
corresponds more closely to the model predictions. Due to the fact 
that the NTA standards are widely accepted, the clinical definitions 
contained in the standards were used without alteration in the present 
study. 

We turn now to the appraisal of a group of patients who had far 
advanced active tuberculosis at the time of diagnosis [8]. During the 
period of time in which this group was under study (1938-1951), far 
advanced tuberculosis, which involves an extensive portion of the lung 
tissue, healed quite slowly since highly effective therapy was not yet in 
general use. Hence there was an appreciable number of persons whose 
serial chest roentgenograms showed improvement over a protracted 
period of time. As a result the transition, active — arrested, was 
rarely encountered in the first year following diagnosis. 

We must therefore modify the model described in Section 2 in order 
to account for this fact. One way of accomplishing this purpose is to 
add an “initial” state to the model defined as follows: 

E,, Active tuberculosis that is certain to be found in state EZ, on 

the first anniversary of diagnosis. 

The matrix of the transition probabilities is: 

0: 0 0 0 


Po O O : Pas Des O C* Bt 


0 Ps2 O ps Pos 0 0 


“ 


{ 
P 
| 
@ 
0 O Pes 0 O po 0 
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while the matrix of the initial probabilities is: 


00000 01] 
0 


A* = 


oo co fc .¢ 
oo o 
ll 


— 
ooo 
coc ooo 

a 


so that 


| ° 
— B*) "(I — B*)C* 


The matrix B* may readily be diagonalized and the indicated matrix 
multiplications then carried out, thus obtaining the elements of A*P**. 
However, the tth-order transition probabilities may be derived more 
easily by noting that any path originating in state EZ, will certainly 
pass through E, on the first anniversary and thence, because of the 
Markoff property, will have a tth-order transition probability equal to 
the (¢ — 1)st-order transition probability of a similar path starting from 
E; in the original model. On the other hand, any path not originating 
in state #, has precisely the same higher-order transition probabilities 
as in the original model. Hence we may write 


(t) (t) (t) (t-1) (t) (t—1) 
(t) (t-1) (t) (t-1) (2) (t-1) 
(t) (t-1) (t) (t) (t) (t) 
atpi; = , = = AsPes 
Corresponding changes must be made in the maximum likelihood 
estimators. 


It was found that the model predictions relating to the group of 
patients ill with far advanced tuberculosis deviated appreciably both 
from the actual observations as well as from the direct estimates based 
on these observations. This lack of correspondence raised the possibility 
that some variable factor existed among the group of patients which 
had an appreciable effect on the follow-up experience. Since one of the 
main conclusions of the study was: “Age has a profound influence on 
the clinical outcome of far advanced tuberculosis, old persons having a 
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case fatality more than twice that of young persons,” it was decided to 
divide the patients into two subgroups: those whose age was less than 
45 years at the time of diagnosis and those whose age exceeded 45 years 


Figure 2 
OBSERVED vs. PREDICTED VALUES 
FAR ADVANCED TUBERCULOSIS - AGE OVER 45 
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Figure 3 
OBSERVED vs. PREDICTED VALUES 
FAR ADVANCED TUBERCULOSIS - AGE UNDER 45 
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at diagnosis. The results of the computations are shown in Figures 
2 and 3. It is to be seen that reasonably good agreement exists between 
the model predictions and the direct estimates based on the obervations. 
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APPLICATION OF QUANTAL-RESPONSE THEORY TO 
THE CROSS-COMPARISON OF TASTE-STIMULI 
INTENSITIES* 


N. T. Gripceman 
Division of Applied Biology, National Research Council 
Ottawa, Canada 


SUMMARY 


Coded pair comparisons of the relative intensity of 4 primary-taste 
stimuli are subjected to logit analysis. The tie-in of the model with 
standard psychophysical concepts is demonstrated. Estimation of 
relative intensity (the mean results are: sucrose: sodium chloride: citric 


acid: quinine sulfate = 1:14:220:2300) and the evaluation of errors are 
discussed. 


INTRODUCTION 


Every sense impression has an intensity component, and the gustom- 
etrist may ask whether the intensities of qualitatively different taste 
stimuli can be measured on a common scale. Is it in principle assertable 
that the taste of one compound is z times that of another, entirely 
different, compound? Methodological difficulties inhibit a ready 
answer; moreover, there is a personal equation to be reckoned with. 
Nevertheless, some progress has been made. Twenty years ago, Bujas 
[4], using a Geschmackslupe, concluded that sucrose and sodium chloride 
were so comparable. More recently, Beebe-Center and Waddell [1] 
submitted evidence that all four primary tastes could be regarded as 
co-metrical, and they went on to propose that the subjective strength 
of 1% aqueous sucrose be made a general unit of taste intensity, to 
be called the gust (which, incidentally, is already a meteorological 
unit!), a concept of interesting theoretical and practical potentialities. 
The present paper tackles the problem on a new, probabilistic, basis 


wherein the theory of quantal response bioassay is coupled with the 
method of pair comparison. 


*Contribution from the Division of Applied Biology, National Research Laboratories, Ottawa, 
Canada. Issued as N. R. C. No. 4887. 
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THEORY 


If a subject is given two different concentrations, C,; and C, , under 
code and in random order, of the same solute, and he is asked, ‘‘Which 
tastes the stronger?”’, we assume the probability P of, say, the “7” 
answer to be sigmoidally related to the logarithm of the concentration 
ratio. This assumption is linked to Thurstone’s “phi-log-gamma” 
hypothesis of the psychosensory response system [8]. Thurstone took 
the sigmoid to be the cumulative normal function, but for analytical 
purposes it is preferable to use the logistic or autocatalyiic function— 
which in this context is operationally indistinguishable from the normal. 
So we may write 


P = {1 + exp [a + B log (C,/C;)]}*- (1) 
from which the linear relation 
Y = In (P/Q) = a + B log (C,/C,) (2) 


derives. In these equations a has an expectation of zero, and @ is a 
parameter of sensitivity, and Q = 1 — P. If, at each of n different 
values of C,/C; , we replace P by R/N (the proportion of specified 
answers in N-plicated trials),.then the method of least squares, with 
weighting, will yield, as Berkson has shown [2, 3], a consistent and 
asymptotically efficient estimate of 8. However, for a given C,/C; , it 
does not necessarily follow that the corresponding 8 will be independent 
of the numerator and/or the denominator. 

In sensory perception, acuity is usually expressed as the Weber 
fraction, which is the smallest detectable stimulus difference as a 
fraction of the lesser, “standard,” stimulus. If, at this point, the 
lesser stimulus concentration is C;, and the greater is C;, , the Weber 
fraction, which we shall denote by F, will be C;,;/C;, — 1, and is therefore 
easily seen to be a function of the regression coefficient 8, which is 
itself a measure of the steepness of the slope of the underlying sigmoid. 
If we invoke a probability P, = 2P — 1 of sensory discriminability 
[6], the smallest detectable stimulus difference can be objectively 
defined as that for which P, = 1/2. It follows that the Weber fraction 
will be 

F = antilog (3) 

For a given stimulus, the expectation Y = 0 occurs at C; = C; . 

If, however, the comparisons are between two qualitatively different 


stimuli, and if rational ranking-intensity decisions are still feasible, 
the concentration ratio at zero Y will be the reciprocal of the taste- 
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intensity ratio of the pure stimuli, weight for weight, and this is our 
experimental objective. Goodness-of-fit tests will supply criteria of 
rationality. 


EXPERIMENTAL 


Four subjects were used. Preliminary trials with aqueous solutions 
of sucrose, sodium chloride, citric acid, and quinine sulfate (representing 
the four primary tastes) indicated that cross-comparisons of relative 
intensity were practicable. Eventually the solute concentrations given 
in Table 1 were settled on for the experiments proper, in which sucrose 
was to be matched against each of the other three substances. The 
exploratory work amounted to a search for the nonasymptotic area of 
application of equation (1). 


TABLE 1 
Aqurous CONCENTRATIONS OF STIMULI 
Sodium Citric Quinine 

Sucrose chloride acid sulfate 
ref. ppm ref. ppm ref. ppm ref. ppm 
S1 15,000 | Sc-1 1000 Ci-1 75 Qe-l 8 
S-2 19,500 Se-2 1500 Ci-2 150 Qs-2 i2 
S-2a 23,400 Se-3 2250 Ci-3 220 Qs-3 16 
8-3 25,400 Se-4 3325 
S-4 33,000 


The sodium chloride trials were made by all four subjects; the 
others, carried out some months later, were made by two of the subjects. 
There was one difference in conduct: whereas the sodium chloride 
trials were so arranged that each subject had to make 16 pair com- 
parisons per session (two sessions per day), the later trials were con- 
certina’d, as it had meantime been discovered that such tasting 
chores could be made much heavier without any fatigue-induced change 
in P, . So in the citric and quinine trials each subject did 96 comparisons 
per session (which took about 20 minutes) and 3 sessions per day. The 
subjects preferred these ‘gustathons” to the spreading of the work over 
many more days. 

Each session comprised an integral number of blocks. Each block 
consisted of a double set of all the different pair comparisons in the 
particular experiment, the two sets being chiral complements—that is, 
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in one the sucrose member of the pair was always on the left-hand side, 
and in the other, the right-hand side. The order of tasting of the 
pairs within each block was randomized. Appraisals were made in 
air-conditioned, distraction-minimized, single-seater booths. The 
invariable question was, “Which member of the pair has the stronger 
taste?” After being decoded, the answers were recorded as 1 or 0, 
according to whether or not the decision was in favor of the sucrose. 

As the three experiments each had a natural (time interval) split, 
two “series” of results were distinguishable, and these were treated as 
separate sub-experiments. 


RESULTS 


Throughout, FR will be defined as the number of decisions for sucrose 
in N replicated trials by a given subject. Subscripts lh and rh will 
denote the position, left- or right-hand, of the sucrose, so that Ry, + 
Ry = R. 

The. raw data are presented in the form of a checker-diagram in 
Figure 1. So set out, they lend themselves to tests for statistical 
homogeneity (in the sense of the desirable independence of rows and 
columns) for which Cochran’s Q test [5] is well suited, and to tests for 
secular trend. The horizontal subtotals will yield information on 
inter-series differences and on chiral bias. This bias can be quantified 
[7] in the form (R,, — Ry)/(2N — R,, — R,,), which is an estimator 
of the probability of the right-hand decision (or, if negative, the left- 
hand) independenily of the sensory decision. (Chiral bias cancels 
when, as here, the design is appropriately balanced.) 

Apart from stating that such “independence” tests as have been 
applied indicate a reasonable inter-block homogeneity within series 
and subjects, we shall not give any more space to this aspect of the 
work. There is some evidence of inter-series differences and strong 
evidence of inter-subject differences. 


STATISTICAL ANALYSIS 
Sodium chloride results 


Because this first experiment embraced all 16 cross comparisons of 
the 4 levels of both stimuli, it can. be regarded as two series of inter- 
locking comparisons of (i) graded C; = sucrose against fixed C, = 
sodium chloride, and (ii) vice versa. Estimates of 8 can therefore be 
made for both (i) and (ii) and checked for stability and identity. The 
work is facilitated by the fact that the concentrations of both stimuli 
are graded by equal logarithmic steps, which simplifies coding. 
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FIGURE 1 


Black-or-white represents decision for-or-against sucrose as the stronger stimulus 
in the pair comparison with sodium chloride (Sc), or citric acid (Ca), or quinine 
sulfate (Qs). 

The pair comparisons (rows) are arranged in descending order of increasing 
stimulus contrast, within experiments and subjects, viz., sucrose/Sc = 1/4, 2/4, 
1/3, 3/4, 2/3, 4/4, 1/2, 3/3, 2/2, 4/3, 1/1, 3/2, 2/1, 4/2, 3/1, 4/1; sucrose/Ca = 
1/3, 1/2, 2/2, 2u/2, 1/1, 2/1, 2a/1, 4/1; sucrose/Qs = 2/3, 2a/3, 2/2, 1/1, 4/3, 
2/1, 4/2, 2a/1. (See Table 1 for key.) 

Within experiments and subjects, the half blocks or “sets” (columns) are in 
time order (reading left to right), split into series 1 and 2 (sub-experiments) and 
into chiral complementaries (LH or RH means that the sucrose solution was 
positioned to the left-hand or right-hand side of the pair, as presented). One block 
consists of the i-th LH half-block plus the i-th RH half-block, inter-randomized. 


1 (ths) 2 2 (rhs) 
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The data can be conveniently processed in terms of the joint func- 

tional regression 

Y = In (P/Q) = a+ 8.X — 8,2 + yXZ (4) 
where X and Z are the log concentrations of sucrose and sodium chloride, 
respectively. This is a ruled surface on which the two 6's specify the 
slopes, and y the “‘tilt.” Ideally, i.e. if the two stimuli have a common 
metric, the three quantities a, (8, — 8,), and y will be zero, and equation 
(4) wilt become, in effect, equation (2). Estimates (which we shall 
denote by a, b, , b, , and g) of the parameters can be obtained by the 
method of least squares as applied to 3-term multiple regression. Each 
observation, = InR — In(N — R), must be weighted by R(N — R)/N. 
The extremes R = 0 and R = N are advisably [2} replaced by, re- 
spectively, 1/2 and n — 1/2. Taking the residual variance as unity, 
we may then assess the significance of a, (b, — b,), and g by standard 
procedures. 

The outcome of these tests of fit is shown in Table 2. On the whole 
they are satisfactory, although the two big vahies for g, viz., 4.3 and 
—7.7, need remark. Both are offset by small values in their companion 
series, which suggests that the discrepancies are not characteristics of 
the two subjects concerned. Moreover, these values are of different 
algebraic sign, so that we are not, apparently, dealing with a directional 
tendency. 

TABLE 2 


Estimates oF Statistics RELEVANT TO THE Frr oF THE 
Sucrose /Soprum Caiorive Data To THE Move. 


SE SE SE 
I 1 1.37 1.36 —1.76 
2 1.38 —0.40 —0.46 
II 1 0.61 1.00 —0.28 
2 0.58 —1.42 1.68 
Ill 1 —0.48 —1.57 4.30 
2 0.51 —0.05 0.32 
IV 1 0.42 —0.16 0.49 
2 1.95 2.41 —7.72 


N. B. that SE = standard error, and 
that SE(b: — bz) = +/var (bi) + var (bs) — 2 cov (bibs). 


All results 


As the above evidence of the reasonable fitness of the basic hypoth- 
esis of inter-stimulus intensity comparability became available 
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before the second and third experiments were run, the designs of the 
latter were slightly modified: of 12 contrasts possible with 4 levels of 
sucrose and 3 of not-sucrose, only 8 were used, so selected as to give a 
good “‘spread’”’ (see Table 1 and the legend to Fig. 1). The results of all 
three experiments were then treated similarly, by way of the linear 
regression 


Y=a+ DD, ) 
where D is the difference between the log concentrations, sucrose stimulus 


minus not-sucrose stimulus. Writing y and d for deviations from Y 
and D respectively, we have 


b= wdy/D wi’, 
V(b) = wd’, 


and hence, from (4), an estimate of the Weber fraction, F,, is obtainable. 
Furthermore, equating mathematical differentials to statistical devia- 
tions, we may take 


(7) 


VF) = 


(F + 1)*(in 10)*(in 3)°V(b) 

b* (8) 
as a fair approximation, provided of course that the error of b is not 
large. A goodness-of-fit test arises from the calculation of 


wy’ — b wdy (9) 


to be assessed against tabular x’, for n — 2 degrees of freedom (where 
n is the number of different contrasts), at the chosen level of signifi- 
cance. 

The taste intensity of the not-sucrose stimulus in terms of that of 
sucrose itself will, as already indicated, be located at the point where 
there is an even chance that either of the compared stimuli is judged 
the stronger, i.e., at Y = 0. Therefore 


Intensity ratio, M = antilog (— a/b), 
—a/b = D — Y/b, 
(10) 


V(M) = (M In 10)*V(—a/b). 


The resultant estimates of F and M, together with goodness-of-fit 
x’ for each “assay,” are presented in Table 3. 
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~ TABLE 3 


EsTIMATES OF THE WEBER FRACTIONS, AND OF THE SuBJECTIVE TasTE INTENSITIES 
or Soptum Cutoripe (Sc), Crretc Actp (Ca), AND Suurate (Qs), 
1n Terms oF Sucrose INTENSITY 


Weber Intensity 

Stimulus | Subject | Series fraction ratio Xa-2 

I 1 0.304 + 0.038 10.12 + 0.37 15.0 

2 0.239 + 0.021 13.22 + 0.29 22.4 

Il 1 0.318 + 0.039 13.90 + 0.52 11.9 

Salt 2 0.291 + 0.024 15.20 + 0.38 13.3 
(Se) 

n= 16 Ill 1 0.462 + 0.059 12.11 + 0.55 21.1 

2 0.555 + 0.064 13.44 + 0.48 20.0 

IV 1 0.529 + 0.076 12.91 + 0.62 9.1 

2 0.445 + 0.042 15.67 + 0.50 14.6 

I 1 0.429 + 0.069 182.7 + 9.3 8.3 

Sour 2 0.476 + 0.083 146.7 + 8.4 15.8 
(Ca) 

n=8 Il 1 0.697 + 0.131 245.7 + 17.7 10.1 

2 0.535 + 0.095 283.5 + 19.0 5.2 

I 1 0.457 + 0.091 1650 + 82 0.7 

Bitter 2 0.561 + 0.145 2696 + 199 8.2 
(Qs) 

n= II 1 1.056 + 0.481 2582 + 312 28.5 

| 2 0.787 + 0.243 2374 + 181 7.0 

n = number of different pair comparisons. 
DISCUSSION 


The standard errors in Table 3, being “internal,” give no infor- 
mation on inter-assay differences. If analyses of variance are carried 
out on the tabular means (within stimuli), evidence of differences 
between subjects and, in some instances, between series will be found, 
with the interaction terms just about what might be expected from the 
“internal” error estimates. In general, the salt-sweet results are more 
homogeneous and better fitting than the sour-sweet and bitter-sweet 
results. A modest claim would be that in an experiment of size N, n = 
48,16, for a given subject, the estimates of F and M will have coefficients 
of variation of not less than 10% and 5% respectively. 

It is noteworthy that the present estimates of F, the Weber fraction, 
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whose average is about 1/2, are high compared with the values in the 
literature, whose average is about 1/5. ‘This may be accounted for by 
the fact that our work was done at concentration levels lower than 
usual. (It is known that, in any psychosensory field, the Weber fraction 
tends to increase near the lower end of the scale.) 

The finding that the relative intensities, sucrose: sodium chloride: 
citric acid: quinine sulfate, are about 1:14:220:2300 may be compared 
with Beebe-Center and Waddell’s report of 1:4.4:140:2700 for sucrose: 
sodium chloride: tartaric acid: quinine sulfate [1], at a sucrose level of 
10%. Bujas [4], working at concentration levels intermediate between 
those of ours and of Beebe-Center’s, gave a value of about 1:10 for 
sucrose: sodium chloride. Both Bujas and Beebe-Center used the 
“method of constant stimuli’ [7], which is not purely probabilistic in 
the sense that the method of pair comparison is. 

A graphic check of the appropriateness of the statistical model is 
provided by Figure 2, in which the overall mean estimates of P for all 
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PROPORTION OF DECISIONS THAT C, TASTES STRONGER THAN C, 
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FIGURE 2 


The 32 sucrose (C;) versus not-sucrose (C;) pair comparisons fitted to the 
probabilistic model (with coincident abscissae and pooled slopes). 


16 + 8 + 8 = 32 pair comparisons of C; and C; are plotted on the 
same co-ordinates with pooled b. 
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133 NOTE: On Hemacytometer Counts 


H. C. HaAMAKER 


Philips Research Laboratories 
Eindhoven, Netherlands 


The number of red blood cells counted in the square divisions of a 
hemacytometer counting chamber have been found not to follow a 
Poisson distribution, the variance being significantly smaller than the 
mean. In a recent article in this journal Turner and Eadie’ explain 
the discrepancy by assuming that the blood cells are of finite size so 
that only a limited number can be placed in each square division. 
Under this assumption they establish a hypergeometric distribution 
from which it can be derived that 


or approximately 
n 2 
where 
“ = average number of blood cells per square division, 
a = maximum number of blood cells per square division, 
n = number of square divisions counted, _ 
N = na = maximum number of blood cells these n square divisions 


can contain. 


The purpose of this note is to show that formula (2) can be deduced 
in a very simple manner. We have in fact na = N positions of which 
r = np are occupied in random order by blood cells. A square division 
can then be considered as a random sample of size a taken from a large 
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lot of size N, a fraction 


(3) 


of which is red. In first approximation the number of red items in a 
square division will possess a binomial distribution with variance 


which is identical with (2). 
The main result of Turner and Eadie signifies therefore that the 
distribution of the counts is binomial instead of Poisson. If in theory 
we reduce the size of the cells, the number a increases and the probability 
p decreases, the product 4 = pa remaining constant. In the limit this 

will lead to a Poisson distribution. 

It is perhaps also worth mentioning that from a purely theoretical 
point of view formula (2) would seem to us more satisfactory than (1). 
On closer examination formula (1) is found to represent the expected 
value of the mean square computed from the counts of n square divisions 
over replications consisting in a randomization of exactly r = mp blood 
cells over these n square divisions. In actual practice a replication 
consists in refilling the hemacytometer with a fresh sample from a stock 
of the same diluted blood. Then the number of blood cells r in » square — 
divisions is not constant but may, ideally, be supposed to follow a 
bionomial distribution; the square divisions themselves are to be con- 
sidered as independent samples from the blood in question, and the 
binomial as their correct: distribution. If we randomize a constant 
number of r blood cells over m square divisions the numbers k, , k; 
counted in two particular square divisions are negatively correlated, 
and formula (1) takes this effect into account. But if r varies according 
to a binomial distribution, this introduces an additional positive corre- 
lation which exactly cancels the negative correlation for constant r, s0 
that k; and k; are now independent. 
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, Chi-Squares of Bartlett, Mood, and Lancaster 
in a 2° Contingency Table 


In a three-way contingency table, what similarities or differences 
are there between the interactions tested in the following three cases: 


1. The “second-order interaction” tested by Bartlett (Contingency 
table interactions, J. Roy. Stat. Soc. Suppl. 2: 248, 1935). 

2. The tests of (a) “whether all three criteria are mutually independ- 
ent”? and (b) “whether (one classification) was independent of 
(the other two),”’ as proposed by Mood (Introduction to the Theory 
of Statistics, McGraw-Hill Book Company, p. 250, 1950). 

3. The interactions of Lancaster (Complex contingency tables 
treated by the partition of x°, J. Roy. Stat. Soc., Series B, 13: 
242, 1951). 


From the data in Table 1 below, the chi-squares for the interactions 
have been calculated and are shown following the table. 


TABLE 1 
A Turee-Way ConTINGENCY TABLE 
Ci C; 
B, B, Bi B: 
A. | 79 62 73. 168 
Ag 177 121 81 75 


Chi-squares for the three cases, respectively, were found to be as 
follows: 


1. The “second-order interaction” of Bartlett was found to be 
x’ = 19.568, df. = 1, P < .001. 

2. (a) The test of independence of all three criteria yielded x’? = 
111.10, df. = 4, P < .001. 
(b) The test of whether C was independent of A and B yielded 

x’ = 86.72, df. = 3, P < .001. 

3. In Table 2 below are shown the partition of x’ and the corre- 

sponding degrees of freedom of Lancaster. 


There is no apparent agreement among the three kinds of second- 
order interaction. If they are similar, how can they be interrelated? 
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TABLE 2 
PAagTrrion OF x* 

Source x* df. 
AB 24.19 1 
AC i 
BC 31.38 1 
ABC 7.80 1 

Total 132.00 4 


In non-specific terms it may be said that the second-order 
ANSWER: interactions of Bartlett and Lancaster answer the same 

type of question, and that Mood tests combinations of 
the Lancaster interactions. The discrepancies observed have two 
sources: (i) the models used by the three investigators are different, 
and (ii) different approximations to the chi-square distribution are 
employed. 

More specifically, Lancaster shows that ‘‘--- Bartlett’s x” is asymp- 
totically equal to our second-order interaction - -- However, the asymp- 
totic approach to equality may not be rapid as the observed numbers 
increase. There may be discrepancies even at a size not likely to be 
met with in practice, especially if there is any considerable degree of 
first-order interaction.” Lancaster gives examples in which the dis- 
crepancy is much less than that observed in yours. In my limited 
experience, I have found the two estimates much alike. 

Mood gave special consideration to two cases: (i) completely random 
sampling, and (ii) fixed totals in one classification. In both of these 
special cases, the parameters were estimated from the border totals. 
Consequently, in the 2 X 2 X 2 table there are 4 degrees of freedom 
for the test of whether all three criteria are mutually independent and 
3 for the test of whether one classification is independent of the other 
two. 

Referring to your Table 2, chi-square for complete independence 
corresponds to the sum of the 4 chi-squares, 132.00. This is to be 
compared with your x” = 111.10 calculated by Mood’s method. Both 
correspond to extremely small probabilities between which no profitable 
comparison can be made. 

It is illuminating to observe that this test for complete independence, 
calculated by the familiar method of “expected” frequencies (instead 
of by Mood’s method based on maximum likelihood), results in a chi- 
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square identical with the total in Table 2. This shows that the dis- 
crepancies which you observed are due to the different methods of 
estimation used, each involving an approximation to the chi-square 
distribution. 

For the two cases specified above, Mood’s test (6), whether classi- 
fication C is indepedent of A and B, results in a chi-square (3 degrees 
of freedom) which corresponds to the sum of 3 chi-squares in Table 2: 


Xic + xec + xtc = 68.30 + 31.80 + 7.80 = 107.90. 


This is comparable to your maximum likelihood solution, x’ = 86.72’ 
and to the method of “expected” frequency x” = 93.73. 

For a special case which Mood did not consider, the AB totals fixed 
(rather common in some fields), the tests of complete independence and 
of the independence of C from A and B are identical. In this case, the 
sum of the three Lancaster chi-squares, 


Xac + Xe + 


is equal to the chi-square for the test of C independent of A and B, 
calculated by the use of “expected” frequencies. Here x‘, is identically 
zero. 
It is a pleasure to acknowledge the help of Mrs. Eve Bofinger in the 
preparation of this answer. 


CORRECTIONS TO PAPERS 


H.O. HARTLEY. Maximum Likelihood Estimation from Incomplete 
Data. Biometrics 14 (2), 174-194, 1958. 


On page 183 insert above line 16 from the bottom the missing heading: 
4. Acceleration of convergence. On page 189, Table 5, the half-line for z = 6 
was omitted. This half-line should read as follows: 


zign A, 


4 
7 


ABSTRACTS 


The following papers were presented at meetings of the Biometric 
Society, Eastern North American Region, and the Institute of Mathematical 
_ Statistics in Gatlinburg, Tennessee, April 10, 11, and 12, 1958. 


I. 8S. BANGDIWALA AND R. J. MONROE (University of 
507 Puerto Rico Experiment Station and the Department of Experi- 
mental Statistics, North Carolina State College, Raleigh, North 
Carolina). Ordering Population Means by Sequential Procedures. 


Let be NID (us (i = 1, 2, » N;) 
where y’s and o”’s are unknown. Let the ranked means be 


when it is not known which population is associated with w,;,;. The 
“best” population is defined to be the one with highest mean. 

standard error of the difference between the two means, then the 
statistics 


t;; = — teu — (i,j, = 1,2,--- 
follow a generalized ‘distribution. A sequential procedure is then 
formulated for each of the following four cases giving a stop rule with 
a guarantee of minimum probability P* for correct decision in each 
case whenever the corresponding condition or conditions regarding 
the minimum desired difference 5* are met. The cases are as follows: 


Case Condition 
I. To obtain ¢ “best” (unordered) populations. y;,;; — 4;; > 3* 
II. The “best’’ population. — 8* 


III. To obtain ¢ “best” (ordered) population. pass > 

IV. Complete ranking of k population. 
563 


‘ee 
ct 
= 
+ Dev 
: 


564 BIOMETRICS, DECEMBER 1958 


I. S. BANGDIWALA AND R. J. MONROE (University of 

508 Puerto Rico Experiment. Station and the Department of Experi- 
mental Statistics, North Carolina State College, Raleigh, North 
Carolina). Ordering Population Regression Coefficients by 
Sequential Procedures. 


Let X;. be observations of a (fixed) independent variable (¢ = 1, 
2,---k;a = 1,2, --- , N;) and Y,, , the corresponding observations 
of the independent variable from the population x; , NID (yu, 9%). 
Let: 8; be the population regression coefficients for 7; , and let the 
ranked be ; 


Bry S Bi S S S Bey 


where it is not known which population is associated with 6,;,; . If 
b.:) is the least-square estimate of 6;;; , and s; is the estimate of the 
variance of b,;, obtained from the pooled variance, with n, degrees of 
freedom (assuming common variance), it is shown that 


t.= [by bo) — (Bin — Bi) 


have a generalized ¢-distribution. Using this probability distribution 
function, a sequential procedure is proposed which gives a stop rule 
guaranteeing the minimum probability P* of achieving correct goal 
in each of the following four cases of ordering population regression 
coefficients when the specified condition or conditions are met, 6* 
being the minimum difference desired between two regression coefficients. 


(@,j = 1,2, , k) 


Goal Condition 
I. To determine ¢ populations with highest _ > 5 
regression coefficients (unordered). Biss — Bry 
If. Goal I when ¢ = (the highest). Bret Bre-1y > 


* 


III. Ordering ¢ highest regression coefficients. ee — Bin 2 8 
IV. Goal III when ¢ = k — 1 (complete ranking). 6,;,,; — 8,;; > 6*. 


When the number of degrees of freedom is very large, an approximate 
probability function is proposed. 


z 


I. 8. BANGDIWALA AND R. J. MONROE (University of 
Puerto Rico Experiment Station and the Department of Experi- 
509 mental Statistics, North Carolina State College, Raleigh, North 
Carolina). Ordering Population Variances by Sequential Pro- 


From each of the k normal populations x; (¢ = 1, 2, --- , &) with 
unknown means and unknown varmmecs , 2 sample of 
pendent observations z;. (i = 1,2, --- ,k;a = 1, 2, --- , N,) are taken. 
The “best” population is defined to be the one having the smallest 
variance. Let the ranked variances be 


oi < < ota < < sin 


where it is not known which population is associated with o7,, . Let 
8; be the best estimate of o7,, with n, degrees of freedom. 
It is then shown that 


2 2 
follows a generalized variance ratio (F) distribution; the appropriate 
probability distribution functions are then utilized in proposed se- 
quential procedures for the following four cases in each of which - 
stop rules are formulated with the guarantee of minimum probability 
P* of correct decision for ordering the population in the particular 
case when the condition or conditions regarding the minimum variance 
ratio @* as specified below. exist. : 


Case Description Condition 
‘ ” 2 
‘best” populations (unordered). sa > «>a 


2. To determine the “best” (Case (1) when { = 1). si > ad 


3. To determine the order of the ¢ “‘best’”’. @ ) dara 
populations. 
4. To determine complete ranking (Case (8) when 
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510 G. E. P. BOX (Princeton University, Princeton, New Jersey). 
Some Recent Work on Non-Linear Estimation and Design. 


In the process of experimental iteration the experimenter is often 
interested in the relationship 


n= f€, 8) 
connecting response 7 with the levels of k continuous variables é, , 
, & and involving p parameters 6, , 0., --- , 6, . The object 


may be empirical description or an attempt to determine the basic 
mechanism. In either case, problems of non-linear estimation arise. 

When the model is derived from some theory the function is often 
not given explicitly but, for example, in terms of differential equations. 
Numerical methods for making approximate estimates of goodness of 
fit, and for estimating the 6@’s and their reliability, are discussed. 
Examples illustrate: 


(i) fitting a kinetic model defined by differential equations 
(ii) fitting a reduced polynomial (canonical) model 
(iii) determining transformations in independent variables. 


Various interesting design problems arise. A method worked out 
with Professor H. L. Lucas for obtaining best designs in the non-linear 
situation is described and applied for a model arising from chemical 
kinetics and an empirical reduced canonical model. 


RALPH A. BRADLEY (Virginia Polytechnic Institute, Blacks- 
511 burg, Virginia). Triangle, Duo-trio, and Difference-from-Control 
Tests in Taste Testing. 


Much experimental work has been done in comparing triangle 
and duo-trio tests for the detection of differences in. taste testing. 
Some of the mathematical implications have also been discussed by 
Hopkins and Gridgeman [Biometrics 11 (1)]. In this paper we assume 
subjective responses on a response scale and develop means of calcu- 
lating corresponding probabilities of correct choices in triangle and 
duo-trio tests and expected difference from control in the difference- 
from-control test. The two probabilities mentioned above are expressed 
in terms of the power function of analysis of variance but required 
special evaluations in this problem. A table showing comparisons of 
the several tests is presented and from these power comparisons and 
power efficiencies may be evaluated. 
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512) W. CLUNIES-ROSS (Virginia Polytechnic Institute, Blacks- 
burg, Virginia). Mixed Exponential Failure Distributions. 


In this paper the random, or negative exponential, distribution 
of failure is assumed. However, the parameter of failure is further 
assumed to have a probability distribution. In particular, the overall 
distribution of failure is evaluated when the distribution of the parameter 
is (a) Chi-squared and (b) Inverse Gaussian. 

The effect of testing, and the appropriateness of two types of tests, 
ure considered. It is demonstrated that for the tests considered the 
“mean loss of mean life’ due to failure during testing is equal to the 
mean time under test. 


W. 8. CONNOR (National Bureau of Standards, Washington, 
513 Db. C.). Use of the Direct Product of Matrices in the Analysis of 
Factorial Experiments. 


Investigation of unsymmetrical subsets of factorial combinations 
has been retarded by the apparent difficulties in the statistical ‘analysis. 
tecent work has provided a method which greatly facilitates the de- 
termination of the normal equations for unsymmetrical factorial 
combinations. This method promises to be very useful in the analysis 
of fractional factorials of mixed series, and in many other factorial 
designs. The theory of this method is discussed and illustrated by an 
example of a one-half replicate of the 2°3° complete factorial. 


RICHARD G. CORNELL (Communicable Disease Center, 
514 Atlanta, Georgia). Non-Linear Estimation for Linear Com- 
binations of Exponentials. 


A brief review of four methods presently available for estimating 
the parameters of a linear combination of exponentials is given. The 
four methods considered are the iterative maximum likelihood pro- 
cedure, the Prony method, the graphical “peeling off” procedure, and 
the method of grouping sets of successive observations. Because 
experimenters often do not have adequate computing facilities to 
compute maximum likelihood estimates, the latter three methods are 
examined by means of some empirical sampling studies. Even if 
adequate computing facilities are available, preliminary estimates 
may be obtained by one of these methods. Empirical studies are also 
used to investigate the improvement in fitting the model y = ao + a, A‘ 
when maximum likelihood iterations are used to improve estimates 
obtained by the method of grouping.. 
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515 H. A. DAVID (Virginia Polytechnic Institute, Blacksburg, 
Virginia). Paired Comparisons and Tournaments. 


Suppose two stimuli (e.g. flavors, colors) are presented to an observer 
who simply states which one he prefers. Such a paired comparison 
may be regarded as a game between two contestants (the stimuli), 
and an experiment involving paired comparisons among n stimuli is 
analogous to a tournament of n players. IXnock-out and Round Robin 
(full-scale) tournaments were studied, with various assumptions about 
the strengths of the players. 

Questions considered include finding the probability that a specified 
player will win a knock-out tournament, with or without ‘‘seeding.” 
For the Round Robin tournament the distribution of scores is obtained 
on the assumption of equality among the players. This distribution 
is used to construct tests of significance of whether a given player is 
above average strength, whether two given players differ, and whether 
the top score attained is significantly high. 

Some comparisons are made of the efficacy of the two types of 
tournaments in picking out the best player. 


JAMES R. DUFFETT (Virginia Polytechnic Institute, Blacks- 
516 burg, Virginia). Estimation of System Reliability from Component 
Reliabilities. 


A complex automatic electro-mechanical system is to be synthesized 
from physical components. The probability density function of the 
failure times of the system is considered for cascaded components 
and also for redundant componentry. The estimation of system 
reliability from the failure times of the individual components is then 
discussed. 


DAVID B. DUNCAN (Department of Statistics, University of 
517 North Carolina, Chapel Hill, North Carolina). Useful Bayes 
Solutions for Multiple Comparisons Problems. I. 


A Bayes solution is developed for the common t-test problem of 
testing the hypothesis @ < 0 against the alternative 6 > 0 given observed 
values of x and s where z is normally distributed with @ as mean and 
variance o” and s’ is an independent estimate of o° distributed as 
x0 /v. The ultimate objective is to solve many forms of multiple 
comparisons problems generated by the restricted products (Lehmann, 
A.M.S., 1957, 1-25) of problems of the given form, the Bayes solutions 


fe 
7 
4 


ABSTRACTS 569 


to be obtained as corresponding products of solutions of the form 
developed. The loss function assumes losses proportional to | @ |, the 
factor for type I errors being k times that for type II errors, k > 1. 
The Bayes function is a normal density with mean 0 and variance 
yo. These functions fit, at least to a satisfactory degree of approxi- 
mation, a wide variety of problems met in practice. The solution 
(restricted to invariant procedures) has the critical region x/s > t 
where ¢ is a function of the degrees of freedom », loss ratio k and dispersion 
ratio y*. A brief table of ¢ with these three arguments is presented. 


RUDOLF J. FREUND AND RICHARD W. VAIL, JR. (Vir- 


518 ginia Polytechnic Institute, Blacksburg, Virginia). On Problems 
in Residual Analysis. 


A two-stage regression model is considered where the first stage 
is a regression, performed in the normal manner, and the second stage 
consists of a regression on the residuals from the first stage regression. 
Possible biases resulting from the use of this model as against a simul- 
taneous or one stage regression model are investigated. Although the 
two stage method does provide biased results, it may be quite useful 
for certain specific purposes such as the screening for additional factors 
in a system that can be described by regression analysis. 


JOHN J. GART (Oak Ridge National Laboratory, Oak Ridge, 
519 ‘Tennessee, and Virginia Polytechnic Institute, Blacksburg, 


Virginia). A Sequential Decision Procedure for Comparing 
Survival Curves. 


Kimball, Burnett, and Doherty [1957, Radiation Research 7: 1-12] 
devised a method of sequential sampling for screening compounds for 
their effectiveness against injury caused by irradiation. Their analysis, 
which employs Wald’s method of dealing with double dichotomies, 
is not rigorous because the random variation of the binomial parameter 
involved is not taken into account. The procedure given here provides 
for this by assuming logistic survival curves. There are set forth tables 
of resultant unconditional distributions from which Wald’s Z statistic 
can be computed. The prescribed probabilities of the errors of the 
first and second kinds are shown to be fairly insensitive to the particular 
form of the sigmoidal survival curve assumed by comparison with the 
results assuming uniform distributions. The OC function, the ASN 
function, and the effect of truncation are treated for particular cases. 
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WILLIAM A. GLENN AND CLYDE Y. KRAMER (Virginia 
520 Polytechnic Institute, Blacksburg, Virginia). Analysis of Variance 
of a Randomized Block Design with Missing Observations. 


The estimation of several missing values in a randomized block 
design is considered. The method used is that of minimizing the 
error sum of squares, proposed originally by Yates [1933]. explicit 
equations for each absent value are derived for all cases in which not 
more than three values are missing. general formula valid for any 
permissible number of missing observations is given for the case in 
which no two values are missing in the same block or treatment, and 
also for the case in which all of the values missing are in a single block 
or treatment. A procedure for the completely general case is proposed, 
This, although requiring the inversion of a symmetric matrix of order 
equal to the number of missing observations, may prove to be less 
tedious in application than the iterative method proposed by Yates. 

A direct method of analysis not requiring a correction for bias 
in the treatment sum of squares is discussed and demonstrated. Formulas 
are given for the bias introduced when an analysis of variance is 
carried out on the augmented data. These, though equivalent to the 
generalized formula given by Yates, are found to differ from the latter 
slightly in form. 


MARVIN A. KASTENBAUM (Oak Ridge National Laboratory, 
521 Oak Ridge, Tennessee). Estimation of Relative Frequencies of 
Four Sperm Types in Drosophila Melanogaster. 


The method of maximum likelihood is applied in estimating the 
relative frequencies of four sperm types in Drosophila Mclanogaster. 
The distribution of the observed frequencies is characterized by the 
product of two multinomial distributions, in each of which one parameter 
of the four is not observed. In essence, the data may be regarded as a 
2 X 4 contingency table with two missing values. Explicit: estimates 
of all the parameters are determined, as well as explicit expressions of 
their asymptotic variances and covariances. 


THERESE KELLEHER (United States Department of Agri- 
522 culture, Washington, 1). C.). A Synthesis of Diallel Cross 
Methodology. 


In recent years the study of diallel crosses, composed of inbred 
parents and the offspring obtained by intercrossing the parents, has 
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been proposed as a means of characterizing the hereditary constitution 
of the randomly mating population from which the parents were derived. 

In this paper a regression procedure, which postulates a specific 
non-linear relationship between parents and offspring, was compared 
with the general method of correlation among relatives as adapted to 
this type of genetic material. The composition of various phenotypic 
measures used to estimate genotypic parameters was obtained for a 
number of models, and it was shown that genetic interpretation of a. 
joint analysis of the inbred parents and hybrid offspring is dependent 
upon the assumption that heredities of inbred parents and hybrid 
offspring react similarly to environmental influences. — 

Results of the analyses of data from two corn populations were used 
to illustrate the methodology and to point out some of the intrinsic 
difficulties present in this type of data when valid inferences to randomly 
mating populations are desired. 


FRANK G. MARTIN, JR. AND C. CLARK COCKERHAM 

523 (North Carolina State College, Raleigh, North Carolina). Adapta- 
tion of High Speed Computing Machines for Empirical Selection 
Studies. 


A program which has been adapted to the IBM 650 is presented for 
empirically evaluating the effects of genetic parameters on mass selection. 
The genetic parameters that may be varied are (1) the number of loci, 
m = 1, 2, --- , 25, per chromosome, (2) the selection intensity which 
is given by the number of individuals, n = 1, 2, --- , 47, of each sex 
to be selected from the total number, N = 1, 2, --- , unlimited, indi- 
viduals of each sex, (3) the number of generations of selection, (4) 
arbitrary scores for each of the three genotypes which are constant for 
all loci and additive between loci, (5) the initial gene frequency, p = 0.00, 
0.01, --- , 1.00, which is the same for all loci, (6) the recombination 
fraction, a = 0.00, 0.01, --- , 0.50, which is the same between every 
pair of consecutive loci, and (7) the environmental variance, «” = 0.00, 
0.01, --- , unlimited. The first four parameters are constants while 
the last three parameters are expected values of variates which are 
stochastically determined. Joint consideration of parameters (4) and 
(7) allows the expected value of heritability in the initial generation 
to be predetermined. Individuals within each sex are selected on the 
basis of their combined genetic and environmental score. Selected 
miles are paired randomly with selected females to generate offspring 
for the next generation. 
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P. A. MILLER, J. C. WILLIAMS, AND H. F. ROBINSON 

524 (North Carolina State College, Raleigh, North Carolina). Estima- 
tion and Use of Genotype-Environmental Interaction Components 
of Variance in Cotton Breeding. 


Fifteen varieties of cotton were evaluated at the same 9 locations 
in North Carolina for a 3 year period. The study was designed so as 
to obtain estimates of the relative magnitudes of the various types of 
variety X environment interactions in cotten variety tests, and to 
consider the implications of these interactions on variety evaluation 
procedures. 

In regard to yield, the variety X location and variety X year 
interactions were both very small and statistically non-significant. 
The second order interaction of varieties X locations X years, however, 
was of substantial magnitude and highly significant. These results 
indicated that the varieties responded quite differently when grown 
under different environments, but that there were no consistent location 
or year effects on differential varietial response during this period of 
testing. Observations on the individual tests suggested that the 
patterns of rainfall distribution and insect infestation were the important 
factors determining differential varietal response. Differences in soil 
type in the area sampled appeared to have little effect per se on the 
relative performance of the varieties. 

Lint percentage, boll size, fiber length, and fiber strength likewise 
showed this same general pattern of the second order interaction being 
larger than either of the first order interactions. For these traits, 
however, all three interaction components were very small relative to 
the variety variance component and thus may be considered as relatively 
unimportant. 

The lack of sizeable variety X location interactions for the state as 
a whole indicate there would be little if any advantage to be gained 
from attempting to divide the state into subareas for breeding and 
testing purposes. Breeding for specific environments might be feasible 
however, to the extent that the important environmental factors are 
under some degree of control or predictable. 


V. L. MOTE, M. V. PAVATE, AND R. L. ANDERSON (North 
25 Carolina State College, Raleigh, North Carolina). Some Studies 
in the Analysis of Categorical Data. 


V. L. Mote studied the effect of misclassification on the asymptotic 
power of the x*-test for goodness of fit when the true class probabilities 
are given under the null hypothesis and for contingency tables with 
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stratified sampling and with completely random sampling. Results . 
were derived for known misclassification probabilities and for certain 
cases when these probabilities had to be estimated from the data. The 
latter were for goodness of fit tests when all misclassification probabilities 
were equal and when misclassification occurred only in adjacent classes. 
The effect on the Type I error rate of neglecting misclassification was 
also studied. 

M. V. Pavate examined the small sample properties of the x°-test 
for independence in selected 2 X 2 contingency tables. He determined 
the exact size of the Type I error and the power of this test for five null 
configurations and five different non-null configurations for each null 
one. Both stratified and completely random sampling plans were 
studied, for total sample sizes of 10 and 20, and with nominal errer 
rates (a) of 0.01 and 0.05. The same situations were studied using 
Yates’ correction for continuity. The true error rate did not deviate 
materially from a for the uncorrected x’; however, the true error rate 
was always much below a when the continuity correction was applied. 

R. L. Anderson investigated the problem of testing for independence 
in an r X r contingency table based on the rankings of r products by 
m consumers. The columns are the r ranks and the rows the r products, 
the number of observations in the (77) cell being the number of tims: 
product i was given a ranking of j. If we let X* be the usual approximats 
x’-statistic, it is shown that (r — 1) X*/r has an asymptotic x'-dis- 
tribution with (r — 1)? degrees of freedom. Other methods of analyzing 
such data are discussed. 


E. J. WILLIAMS (North Carolina State College, Raleigh, North 
526 Carolina). Optimum Allocation for Estimation of Polpnomiai 
Regression. 


In fitting a polynomial of degree p in a variable x to a dependent 
variable y, we are interested initially in estimating the coefficient of 
z’ and testing its significance. If the n observations on z can be located 
at will within given limits (which may be taken as + 1), we may so 
allocate them as to give maximum precision to the estimate of this 
coefficient. 

It is shown that optimum allocation requires exactly p -}- 1 different 
values (possibly repeated) of the independent variable. If 2 is a multiple 
of 2p , the optimum possible allocation is attainable; this is given by the 
2p values of 


cos kx/p (k = 0, 1,2, --- , 2p — 2). 


= 
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Since these values, apart from the values 4 
pairs, they give just the required p + 1 points. 
This allocation has efficiency 50°, for all coefficients of degree 
lower than p. 


and 1, coincide in 


A modified allocation which gives equal efficiency for 
all coefficients is considered. 
If the allocation is restricted—for example, to equally spaced points— 
the restricted optimum may be determined, and its efficiency compared 
with that of the unrestricted optimum. 


The efficiency in this case 
falls off rapidly as p increases. 


The optimum allocation attainable 
here is that in which the numbers of observations at the different points 
are proportional to the binomial coefficients. 

In general situations it may be shown that the allocation which 
results in each observation appearing in the estimate with equal 
weight is optimum. This accords with intuitive ideas. 

The relationship of this problem to the weighing problem is discussed. 
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THE BIOMETRIC SOCIETY 
German Region 


The Fifth Biometric Colloquium took place in Bad Nauheim from the 
twenty-fourth to the twenty-sixth of January, 1958. Topies discussed 
fell under the main headings of correlation theory, biometeorology and 
bioclimatology, biomathematices, and the use of electronic calculators. 
A brief account by the Regional Secretary appears in the Allegemeines 
Statistisches Archiv, Bd. 41, 1958. One hundred and thirty-one parti- 
cipants took part. Abstracts of a number of the papers given at the 
Colloquium appeared in the September 1958 issue of Biometrics (Vol. 
14, no. 3). 


Fourth International Biometric Conference* 


Nearly 200 participants registered for the Fourth International 
Biometric Conference which was held in Ottawa from August 28 to 
September 2, 1958. Proceedings were opened at 9:15 a.m. on August 
28 by an address of welcome from Dr. C. H. Goulden, the Society’s 
President, who then took the chair at the opening session, one of three 
making up a symposium on biometrical genetics. This first session 
dealt with theoretical genetics, and the next two sessions on August 28 
and 29 contained papers on the design of experiments and on experi- 
mantal results. Sessions in the conference proper were devoted to 
papers on multivariate analysis, chi-square and other biometric tech- 
niques, interpretation of experimental results, mathematical and statis- 
tical models in biology, biometry in clinical research, and ecology and 
animal behaviour. 

The Local Arrangements Committee had arranged a full social 
programme. Participants and their families had an opportunity to 
get acquainted at an informal reception on the evening of the opening 
day. This was also attended by official representatives of many of the 
countries whose citizens were attending the conference. On the following 
evening, a banquet was held at the Ottawa Valley Hunt Club, at which 
the speakers were the President and Dr. J. W. Tukey, who was intro- 
duced by C. 1. Bliss. (pigram of the week, from Dr. Tukey’s speech— 
“Problems we do not understand, we call multivariate.) On Sunday, 


*Publieation of these proceedings of the Fourth International Biometric Conference was supported 
in part by a grant from the United States National Science Foundation. 
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August 31, participants were taken on a coach tour of the Gatineau 
National Park area to the north of the city, followed by a buffet supper 
at the University of Ottawa, while on Monday evening a picnic supper 
was served on the lawn of the Dominion Experimental Farm. The 
Ladies’ Programme was also a full one, including a coach tour of the 
city and visits to the Royal Mint, the Parliament Buildings, and the 
National Gallery. 

The Conference was made possible by several substantial financial 
grants. Our gratitude is due in this respect to the National Research 
Council of Canada, at whose invitation the Conference was held in 
Ottawa; to the U. S. National Science Foundation; and to the Inter- 
national Union of Biological Sciences. The Conference was held in the 
auditorium of the Dominion Bureau of Statistics; for the use of this 
fine room, for extensive cafeteria facilities and for much assistance with 
printing and duplicating, our thanks go to the Bureau and especially 
to Mr. W. E. Duffett, Dominion Statistician, and Dr. J. T. Marshall, 
Assistant Dominion Statistician. The local arrangements were in the 
hands of a number of committees whose chairmen are listed elsewhere, 
but special mention must be made of Dr. G. B. Oakland, in charge of 
local arrangements, whose untiring efforts extending over many months 
resulted in an outstandingly successful meeting. 


ORGANISING COMMITTEES 


Programme Committee: Session Organisers: 
L. L. Cavalli-Sforza Session 4—E. J. Williams 
A. E. Cornish Session 5—W. G. Cochran 
W. J. Dixon Session 6—T. A. Bancroft 
D. J. Finney Session 7—L. Martin 
A. Groszmann Session 8—D. Mainland 
B. Harshbarger Session 9—D. G. Chapman 


O. Heinisch 
L. Martin 

G. B. Oakland 
G. 8. Watson 


Genetics Symposium Organising Committee: 


QO. Kempthorne (Chairman) 
Mather 

Comstock 

J. L. Lush 

C. R. Henderson 
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Olfawa Committees: 


Central Committee: A.M. Dutton, C. TL. Goulden, B. Harshbarger, 
J. W. Hopkins, J.T. Marshall, G. B. Oakland. 


Finance Committee: CC. H. Goulden, J. W. Hopkins, A. W. Kimball, 
G. B. Oakland. 


Local Arrangements: 


Chairman: G. B. Oakland 

Accommodation: C. H. Hickman 

Editorial: N. T. Gridgeman, J. St. Pierre 

Exhibits: G. H. Josie 

Printing: C. W. Nickel 

Publicity: Miss C. E. Cox, J. W. Fisher, M. J. Perrault 
Registration: Smillie 

Social: Mrs. G. B. Oakland 

Transportation: Miss E. Newman 

Projection: W. R. Childers 


The Biometric Society:—General Officers 

President: C. H. Goulden 

Secretary: M. J. R. Healy 

Treasurer: A. W. Kimball 

Host Region (E.N.A.R.) 

President: B. Harshbarger 

Secretary-Treasurer: T. W. Horner 
SCIENTIFIC PROGRAMME 

(Sessions 1-3 make up the Symposium on Biometrical Genetics.) 

Session 1. Theoretical Genetics 


Chairman: C, H. Goulden 


A.  R.E. Comstock: Dominance, genotype-environment interaction, and heterosis. 
B. KX. Mather: The balance-sheet of variability. 

C. ©. Kempthorne: Biometrical relations between relatives and selection theory. 
D. bk. C. R. Reeve & J. C. Gower: Inbreeding with selection against homozygosis 


at one or more loci. 
kr. C.C. Cockerham & F. G. Martin: Selection studies on the I.B.M. 650. 
F. D.S8. Robson: Quantitative inheritance in haploids. 
( I, Keiter: Multifactorial genetics in man. 


Session 2. Design of Experiments 
Chairman: F. H. W. Morley 


A. A. Robertson: Designs for measuring heritability and genetic correlations. 


*Papers read by title. 
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H. LeRoy: The interpretation of heritability coeflicients (presented by J. b. 

hk. Goodwin, G. Diekerson, & W. Lamoureux: Separating genetie and 
environmental changes in animal populations under selection. 

J. A. Nelder: Hstimation of variance components in experiments on quanti- 
tative genetics. 

A. W. Nordskog & O. Kempthorne: Genotype-environment interactions in 
random sample poultry tests. 


Session 3. Experimental Results 
Chairman: J. Lush 


H. F. Robinson, C. C. Cockerham, & R. H. Moll: Estimation of dominance 
variance and effects of linkage bias. 
G. A. Martin & A. E. Bell: Experimental check on predicted response to 
selection. 
R. Mason, H. Nicholson, R. Bogart, & H. Krueger: Negative heterosis in 
mouse crosses. 
. H. Smith & D. 8. Robson: Inheritance of dimensions of flower parts in 
tobacco. 


> 


. F. Purser: Regression corrections to increase efficiency of selection. 
C. K. Chai: Heritability of iodine metabolism in mouse thyroids. 


Session 4. Multivariate Analysis 
Chairman: J. W. Hopkins 


M. J. R. Healy: Multivariate growth data. 

M. B. Dawford, H. M. Hughes, & R. C. MeNee: Analysis of repeated-measure- 
ment experiments. 

C. R. Rao: Polynomial regression with correlated variables. 

R. M. DeBaun & A. M. Schneider: Tests of assumptions in multivariate 
analysis. 


Session 5: Chi-square and Other Biometric Techniques 
Chairman: A. W. Kimball 
_ 5S. Watson: Recent results on chi-square goodness of fit tests. 


N. Roy & E. Diamond: Analysis of categorical data. 
. F, Gjeddebaek: Some simple estimates from grouped observations. 


APO 


. Graff: Les levures sélectionées en cidrerie. 


Session 6. Interpretation of Experimental Results 


Chairman: T. A. Bancroft 


'. B. Leech & M. J. R. Healy: Experiments on growth rate. 

W. T. Federer: Treatment design and interpretation of results. 

J. H. Edwards: Estimators relevant to 2 * 2 tables. 

P. G. Homeyer, R. L. Preston, & W. Burroughs: A bioassay for detecting 
small quantities in the presence of an interfering substance. 

J. H. Edwards: Interpreting differences in the internal environment. 

C-M. Wang: Analysis of fractions. 


*Papers read by title. 
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GV. Sableanu & R. Holban: Interprétation des expériences hématologiques. 
Discussant: D. B. DeLury 


Session 7. Mathematical and Statistical Models in Biology 
Chairman: D. B. DeLury 


§ 
A.  N. T. Gridgeman: The lady tasting tea. 
B.  W. Ludwig: Theorical calculations, model tests, and empirical proofs. 
C.t A Fraser: Monte Carlo simulation of genetie systems. 


C. I. Bliss: Periodic regression. 

C. J. Mode: A host-pathogen system for rust of cereals. 

F.* V.Sahleanu: Recherche mathématique dans les sciences morphologiques. 
V. Sahleanu: La eybernétique comme point de vue. 


Session 8. Biometry in Clinical Research - 


Chairman: C. White 


. Yerushalmy & C. Ei. Palmer: The epidemiological method. 
s. Jablon: Clinical-statistical interaction in follow-up studies. 
©. —K. Carter: Clinical trials in a pharmaceutical company medical viewpoint. 
D. D. Mainland: Non-statistical problems met by medical school statisticians. 


E.* Rosin: Blood-group distributions. 
Session 9, Ecology and Animal Behaviour 
Chairman: T. Burnett 
A.D. E. Davis: Behavioural assumptions in census methods. 
B. RK. EE. Blackith: Vector representation of paltirus of growth in grasshoppers, 
C. . R. Rich: Sexual behaviour in the population dynamies of T'ribolium. 


D. F.S. Andersen: Competition in population of one age-group. 


*Papers read by title. 

+To appear in symposium proceedings. 

Abstracts of all the above papers will appear in the next issue of Biometrics (Vol. 15, no. 1). The 
proceedings of the symposium will be published early in 1959 by the Pergamon Press. Many of the 
conference papers will appear in full in Vol. 15 of Biometrics. 


CHANGES IN MEMBERSHIP 
(July-September, 1958) 


Changes in Address 


Mr. G. N. Alexander, Bayview Road, Belgrave, Victoria, Australia 

Mr. C. G. Barraclough, London House, Guilford Street, London W. C., 
England 

Mr. Robert Bechhofer, Department of Statistics, Stanford University, 
Stanford, California, U.S.A. 

Mr. Philippe Bordeau, Yale School of Forestry, 360 Prospect Street, 
New Haven 11, Connecticut, U.S.A. 

Mr. Samuel H. Brooks, 1030 Iliff, Pacific Palisades, California, U.S.A. 

Mr. Pierre Corcelle, 20 rue Monsieur, Paris VII°, France 
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Mr. Jerome Cornfield, 615 N. Wolfe Street, Baltimore 5, Maryland, 
U.S.A. 

Dr. Edwin L. Cox, Biometrical Services —ARS, Plant Industry Station, 
Beltsville, Maryland, U.S.A. 

Mr. Wilford L. Davis, 1765 Skyline Drive, Apt. 19, Pittsburgh 27, 
Pennsylvania, U.S.A. 

Dr. Denoix, Directeur de |’Institut Gustave Roussy, Centre Clinique 
et Thérapeutique, Villejuif (Seine) France 

Mr. Charles W. Dunnett, Statistical Group, Lederle Laboratory, Ameri- 
can Cyanamid Company, Pearl River, New York, U.S.A. 

Mr. George E. Ferris, General Foods Corporation, Post Division, 275 
Cliff Street, Battle Creek, Michigan, U.S.A. 

Dr. Hans Gebelein, Battelle-Institut, Wiesbadener Strasse, Frankfort- 
Main, Germany 


Dr. Hans-Joachim Heite, Deutschhausstr. 9, Oberarzt der Dermatologis- 
chen Universititklinik, (16) Marburg/Lahn, Germany 

Dr. Stanley R. Hill, c/o Mrs. Anita I’. Weldy, 6718—40th Avenue, 
S., Seattle, Washington, U.S.A. 

Prof. J. O. Irwin, Department of Biostatistics, School of Public Health, 
Chapel Hill, North Carolina, U.S.A. 

Mr. Jacob L. Kovner, Route 1, Box 317E, Ft. Collins, Colorado, U.S.A. 

Dr. A. Lilienfeld, 6200 Gist Avenue, Baltimore 15, Maryland, U.S.A. 

Mr. Laurance G. Locke, 49 Keeler Street, Rochester, New York, U.S.A. 

Miss. Kthelyne L. McBee, P.O. Box Uleta Branch, 475 N.E. 167th 
Street, Miami, Florida, U.S.A. 

Dr. E. Novitski, Biology Department, University of Oregon, Eugene, 
Oregon, U.S.A. 

Dr. C. Postma, P.A.W., Postbus 33, Wageningen, Netherlands 

Dr. Anita Rapoport, 820 Coronado Terrace, Los Angeles 26, California, 
US.A. 

Mr. Verne H. Reckmeyer, 2318 Poincianna Street, SW, Huntsville, 
Alabama, U.S.A. 

Miss Jean Roberts, 2800 Quebee Street, NW, Apt. 513, Washington, 
D.C., U.S.A. 

Mr. G. W. Rogerson, c/o ICIANZ, Central Research Laboratories, 
Newston Street, Ascot Vale, Victoria, Australia 

Mr. R. J. Rowlands, 70 Ursa Street, N. Balwyn, Victoria, Australia 

Prof. Henry Scheffé, Princeton University S.T.R.G., 167 Nassau Street, 
Princeton, New Jersey, U.S.A. 

Dr. Morton D. Schweitzer, School of Public Health, 600 West 168th 
Street, New York 32, N.Y., U.S.A. 

Dr. G. J. M. Smith, Fisheries Research Board of Canada, Biological 
Station, 5389 Richmond Street, London, Ontario, Canada 


- 
1 
| 
2 
i 
7 
3 
4 
4 


THE BIOMETRIC SOCIETY 581 


Dr. Albert L. Tester, Professor of Zoology, University of Hawaii, 
Honolulu 14, Hawaii 

Prof. J. W. Tukey, Fine Hall, Box 708, Princeton, New Jersey, U.S.A. 

Mr. Maleolm FE. Turner, Department of Biophysics, Medical College 
of Virginia, Richmond 19, Virginia, U.S.A. 

Dr. Albert C. Walker, 2231 Forestview Road, Evanston, Illinois, U.S.A. 

Dr. E. Weber, Rupprechtstr. 20, Berlin-Lichtenberg 4, Germany 

Mr. Henry B. Wells, Department of Biostatistics, University of North 
Carolina, Chapel Hill, North Carolina, U.S.A. 

Mr. M. H. Westmacott, Candleford, 26 Gordon Avenue, Stanmore, 
England 

Mr. Harold E. Young, 1412 East Van Buren Street, Colorado Springs, 
Colorado, U.S.A. 


New Members 
At Large 


Mr. Louis Munan, Fulbright Commission, ¢/o American Embassy, 
Quito, Ecuador, South America 

Ing. Agr. D. Monzon Paiva, Seccion de lstadistica, C.I.A. Maracay, 
Venezuela, South America 

Dr. Julian Perkal, Wroclaw-Oporow, Harcerska 24, Poland 


Belgian 


Mr. Emmanuel Jean Crenier, Avenue Malvoz n°3, Liége, Belgium 

Mr. Robert Marechal, I.N.I.A.C., Gandajika (far Luputa) Congo Belge 

British 

Mr. A. IF. Purser, 6 South Osward Road, Edinburgh 9, Scotland 

ENAR 

Dr. David W. Alling, 1124 Ellis Hollow Road, Ithaea, New York, U.S.A. 

Mr. Wallace R. Blischke, Department of Plant Breeding, Cornell Univer- 
sity, Ithaca, New York, U.S.A. 

Mr. William C. Burrows, 231 Agronomy Building, Iowa State College, 
Ames, Iowa, U.S.A. 

Prof. John A. Carpenter, 52 Hillhouse Avenue, Yale Station, New 
Haven, Connecticut, U.S.A. 

Mr. Roger W. Hepler, Department of Horticulture, University of 
Illinois, Urbana, Ilinois, U.S.A. 

Dr. Mats Lagervall, Department of Animal Husbandry, Lowa State 
College, Ames, Iowa, U.S.A. 

Mr. Scott Overton, Department of Experimental Statistics, North 
Carolina State College, Raleigh, North Carolina, U.S.A. 
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Mr. Paul G. Sanders, Staff Statistician, Abbott Laboratories, North 
Chicago, Illinois, U.S.A. 

Dr. Herbert M. Schaaf, Box 203, Mandan, North Dakota, U.S.A. 

Dr. Warren C. Stiles, Department of Horticulture, Rutgers University, 
New Brunswick, New Jersey, U.S.A. 

Dr. A. B. Watts, Poultry Industry Department, Louisiana State Uni- 
versity, Baton Rouge, Louisiana, U.S.A. 

Mr. Leonard 8S. Zahn, 30 Florgate Road, Bethpage, New York (Farm- 
ingdale P. O.), U.S.A. 


German 


Prof. Dr. Leopold Kruger, Gleiberger Weg 123, Giessen-Lahn, Oberer 
Hardthof, Germany 

Prof. Paul Lorenz, Kaiserstuhlstr. 21, Berlin-Schlachtensee, Germany 

Dr. Heinrich Staude, Bahnuhofstr. 6, Seebenisch uber Markranstadt, 
Germany 


NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secretary 
(if members at large, to the General Secretary) news of appointments, dis- 
tinctions, or retirements, and announcements of professional interest. 


M.J.W. de Groot of the Netherlands took his degree in medicine at 
Leyden University. His thesis is published by the Netherlands Institute 
for Preventive Medicine under the title, “Quantitative study of absence 
due to neurosis of factory workers in the Netherlands.” 

Ir. M. van Albada of the Institute for Poultry Research at Beck- 
bergen has been appointed lecturer in poultry science at Wageningen 
Agricultural University. 

Professor Henry Scheffé of the University of California will be visiting 
Professor at Princeton University for the academic year 1958-59, where 
his work with the Statistical Techniques Research Group will be par- 
tially supported by a grant from the National Science Foundation. 


BIOMETRIC SOCIETY, E.N A.R. SPRING MEETING 

The joint Spring Meeting of the Biometric Society, E.N.A.R., the 
Institute of Mathematical Statistics, and the Physical Science Section 
of the American Statistical Association will be held at the University 
of Pittsburgh on March 19-21. ‘Titles and abstracts of papers to be 
contributed at this meeting should be submitted in triplicate to Jerome 
Cornfield, Department of Biostatistics, The Johns Hopkins University, 
615 N. Wolfe Street, Baltimore 5, Maryland, by no later than February 
15, 1959. 

Attention is drawn to the instructions on the preparation of abstracts 
on the inside back cover of Biometrics. Information on program and 
accommodations will be mailed to the members of the Region by the 

tegional Secretary at a later date. 


NOTICE TO AMERICAN STATISTICAL ASSOCIATION 
SUBSCRIBERS 

The Council of the Biometric Society has decided to discountinue 
the arrangement giving members of the American Statistical Association 
the privilege of purchasing Biometrics at the reduced rate of $4.00 effec- 
tive January 1, 1959. Subscribers who have been receiving Biometrics 
under that arrangement are invited to join and to participate in the 
appropriate Region of the Society by applying to the Regional Secretary 
for membership. Subscribers who do not reside in one of the Regions 
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of the Society may become members at large of the Biometric Society 
by applying to the General Secretary of the Society for membership. 
Non-member subscriptions are also available and should be sent to 
the business office of Biometrics. Subscription and membership rates 
are noted on the inside front cover of Biometrics. 


EDITOR’S ACKNOWLEDGMENTS 

It was with extreme regret that we recently received and accepted 
Professor G. W. Snedecor’s resignation as editor of the section of Bio- 
metrics on Queries and Notes. The long-term efforts of Professor 
Snedecor have been most helpful to both the present editor and both 
past editors, and we know that his section was regarded as most inter- 
esting and instructive by readers of Biometrics. We intend to continue 
this section in the way that it was developed by Professor Snedecor 
and we wish to express our thanks for his great help and leadership. 

The editor would also like to express his appreciation to members 
of the Editorial Board and all reviewers for their continued assistance 
and cooperation in the publication of Volume 14 of Biometrics. Part- 
icular thanks are due each year to Professor H. W. Norton who has 
constructed each annual Index to Biometrics. 


NEW MATHEMATICS DIVISION 

A newly-constituted Mathematics Division has absorbed the former 
Statistics Branch, Allied Sciences Division, Fort Detrick, Frederick, 
Maryland. Dr. Clifford J. Maloney, formerly Chief, Statistics Branch, 
has been appointed Chief, Mathematics Division. The reorganization 
was made partly for the purpose of further strengthening the use of the 
rational inductive approach in the research and development program 
of the Biological Warfare Laboratories. 


BIBLIOGRAPHY REVISION 


A revision is being made of “Bibliography of Nonparametric Statistics 
and Related Topics,” Journal of the American Statistical Association 48 
[1953], pp. 844-906. 

Material through 1959 is to be included with more emphasis, it is 
hoped, on applications than previously. References (particularly to 
the non-Inglish literature), reprints, and technical reports on the theory 
or applications of nonparametric statistics would be greatly appreciated. 
Also, corrections and additions to the original bibliography are desired. 

Material and information should be sent to I. Richard Savage, 
Statistics Department, University of Minnesota, Minneapolis, Min- 
nesota, U.S.A. 
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POSTDOCTORAL RESIDENT RESEARCIL ASSOCIATESHIPS 

The National Academy of Sciences —National Research Council has 
announced a program of Postdoctoral Resident Research Associateships 
to be offered for 1959-1960. The participating laboratories are the 
National Bureau of Standards (Boulder, Colorado and Washington, 
D. C.); the Naval Ordnance Laboratory (White Oak, Silver Spring, 
Maryland); the Naval Research Laboratory (Washington, D. C.); the 
Navy Electronics Laboratory (San Diego, California); and the U. S. 
Army Chemical Corps Biological Warfare Laboratories (ort Detrick, 
Frederick, Maryland). 

The Air Research and Development Command ‘3 also participating in 
this program at four Air Force installations. These associateships are 
tenable at Air Force Cambridge Research Center (Bedford, Massachu- 
setts); Air Force Missile Development Center (Alamogordo, New Mexico); 
Rome Air Development Center (Rome, New York); and Wright Air 
Development Center (Dayton, Ohio). In addition, the ARDC is sponsor- 
ing a program of Postdoctoral University Research Associateships tenable 
at twenty-one universities in the United States. 

The resident research associateships have been established to provide 
young scientists of unusual ability and promise an opportunity for 
advanced training in basie research in a variety of fields. Modern 
facilities are available in specified areas of the biological, physical, and 
mathematical sciences, and engineering. In addition to the above re- 
search in certain areas of psychology is available. 

Applicants must be citizens of the United States. They also must 
produce evidence of training in one of the listed fields equivalent to 
that represented by the Ph.D. or Sc.D. degree and must have demon- 
strated superior ability for creative research. Remuneration for these 
associateships is from $5985 to $7510 a year subject to income tax. 

Application materials may be secured by writing to the Fellowship 
Office, National Academy of Sciences—National Research Council, 2101 
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sce analysis of variance, determi- 
nantal equation, discriminant fune- 
tion, factor analysis, matrices, prin- 
cipal component 
computation, 6 
Nomography, 154 
Nonparametric tests, 435, 438 
Normal distribution, see normality 
cumulative, see probit transformation 
integrated, sce probit transformation 
Normal equations, 501, and see least 
squares, matrices 
Normality, non-, 71, 426, 437 
Nutrition, 16, 441 
Obituaries, 451 
Ordering, 563, 564, 565, and see tests 
Organolepsis, 39, 548, 566, and see scores, 
subjective evaluation, triangle test 
Ornithology, 385 
Orthogonal polynomials, 506 
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unequal intervals, 287 
Orthogonal squares, 499 
Palatability, see organolepsis 
Perception, see organolepsis 
Pharmacology, 107, 135, 141, 144, 436, 
and see analgesia, bioassay, toxi- 
cology 
Physical science, 33, 388 
Physiology, 131, 436, and see endocri- 
nology, medicine 
Placebo, 21, 135 
Plot size and shape, optimum, 219 
relation to variability, 207 
Poisson distribution, 51, 161, 230, 434, 
558 
bivariate, 57 
truncated, 175 
Pooling, 25 
Populations, see distributions, fish, statis- 
tical genetics 
estimates, 397 
finite, 210 
management, 159, and see ecology 
oscillating, 171 
stationary, 161 
structure, 160 
Precision, 505, and see information, 
least squares 
maximum, 506 
Prediction, 143, and see crop forecasting, 
regression 
Preferences, 39, and see organolepsis 
Probability, 24, 513, 568 
generating function, 229 
transition, 533 
Prony, 567 
Psychology, 435 
Public health, 195, 438 
Quadratic forms, 208, 412, 485 
Quality, see organolepsis 
Quantification, see scales 
Questionnaires, 46, 132, 252, 401, and 
see sample surveys 
Randomization, 398, 430, and see selection 
Random process, see stochastic 
Ranks, 19, and see scores, transforma- 
tions 
Recombination, 135 
Recurrence formulas, 232 
Regression, see adjusted means, canoni- 
cal analysis, correlation, covariance 
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orthogonal polynomials, trend 
adjustment, 116 
analysis, 290, 499, and see analysis of 
covariance correlated unequal errors, 
208 
coefficient, 111, 564 
estimation of, 207 
variance of, 111 
departure from, 211 
deviations from, 211 
homogeneity of, 3 
intercept, 111 
model, see structure 
multiple, 435, 553 
polynomial, optimum design, 573 
Reliability, system, 568 
Replication, fractional, 431 
Residuals, see error. 
Response, see bioassay, models, time 
response 
bias, 132, 401 
quantal, 293, 453, 462, 548 
Sample size needed, 426, and see organo- 
lepsis 
Sample surveys, 93, 141, and see con- 
sumer testing, sampling 
biased, 251, 403 
interview, 252 
non-response, 250 
Sampling, 78, and see components fof 
variance, design of experiments, 
sample size needed, sample surveys 
acceptance, 177 
area, 251 
ecological, 132, 385 
efficiency, 80 
error, 253, and see variance 
studies of statistical problems, 398, 
567, and see Monte Carlo 
systematic, 81 ‘ 
Scales, 33, 436, and see metameter, scores 
Scores, see discriminant function, meta- 
meter, organolepsis, ranks, scales 
distribution of, 568 
maximum likelihood, 186 
Screening tests, 311, and see bioassay 
Selection, see balancing, choice of trans- 
formation, design of experiments, 
genetic selection, randomization, 
sampling 
of subjects, 438 
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Sensitivity data, see quantal response 

Sensory tests, sce organolepsis 

Sequential experiments, 132, 141, 22, 
563, and see decision 

multiple-decision, 408 

Serology, 143, and see hematology 

Simultaneous equations, see matrices 

Singular equations, 284 

Soil heterogeneity, 207 

Standard deviation, see variance 

Standard error, see variance 

Statistical control, see analysis of co- 
variance 

Stochastic processes, 1, 10, 133, 441 

Structural analysis, 125 

Structure, 123 

Subjective evaluation, 33, and see organ- 
olepsis 

Successive approximation, see iteration 

Sufficient statistics, 455, and sce effici- 
ency, estimation 

questionnaires, 


Surveys, see 
surveys 

Survival curve, see time-response curve 

Survival ‘ime, see time-response curve 


sample 


System reliability, 568 
Tables, miscellaneous, 270, 371, 423, 515 
Taste tests, see organolepsis 
Taxonomy, 291, 440 
Tests, see analysis of variance, chi 
square, confidence limits, duo-trio, 
F test, goodness of fit, life testing, 
likelihood ratio, null hypothesis, 
ordering, ranks, sequential, tri- 
angle test, t test 
destructive, see life tests 
exact, 3 
life, 567 
likelihood ratio, 200 
of significance, 10, 112 
of significance of variance, 437 
power of, 426 
psychological, 440 
Theory, see biometry, hypothesis, model 
Time-response curve, 195, 202, 293, 463, 
529, 569 
Time series, 1 
Tournaments, 568 
Toxicology, 168, 
pharmacology 
Trarsect, O85 


and see bioassay, 
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Transfer method, 445 
‘Transformations, 6, 34, and see additiv- 
ity, analysis of variance, bioassay, 
exnnonical analysis, discriminant 
function, logistic curve, matrices, 
models 
choice of, 466 
effect of, 34 
linear, 123 
logarithmic, 109 
logit, 456 
probability, 19 
probit, 453, 549 
ridit, 18 
Tree crops, 140, 142, and sce crop fore- 
casting 
Trend, see regression 
Triangle test, 566 
7’ test, 568, and see confidence limits 
‘Tuberculosis, 144, 527 
Uniformity data, 207 
Variance, see covariance, / test, genetics 
analysis of, 70, 334, 503, and sce 
additivity, analysis of covariance, 
analysis of dispersion, analysis of 
groups of experiments, chi square, 
components of variance, degrees 
of freedom, F' test, interaction, 
least squares, missing values, 
models, multivariate analysis, 
orthogonal polynomials, regres- 
sion, structural analysis, tests, 
transformations, uniformity data 
incomplete data, 360 
interpretation of, 137, 334 
components, 70, 209, 439, 492, 572, 
and see components of covariance 
structural analysis 
variance of, 9 
heterogeneity of, 7-4 
minimum, estimation, 211 
of adjusted mean, 116 
of estimate, 201 
of mean difference, 379, 505 
test of, 565 
Vector, latent, 9 
Weber fraction, 548 
Weighting, 118, 207, 287, 471, 574 
Working probit, 483 
Yield forecasting, see crop forecasting 
Z test, see F’ test 
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